Doxygen
1.9.1
|
Files | |
file | tsgAcceleratedDataStructures.hpp |
Data structures for interacting with CUDA and MAGMA environments. | |
file | tsgCacheLagrange.hpp |
Cache data structure for evaluate with Global grids. | |
Classes | |
struct | TasGrid::HandleDeleter< ehandle > |
Deleter template for the GPU handles, e.g., cuBlas and rocBlas. More... | |
class | TasGrid::GpuVector< T > |
Template class that wraps around a single GPU array, providing functionality that mimics std::vector. More... | |
struct | TasGrid::GpuEngine |
Wrapper class around calls GPU accelerated linear algebra libraries. More... | |
class | TasGrid::AccelerationDomainTransform |
Implements the domain transform algorithms in case the user data is provided on the GPU. More... | |
struct | TasGrid::AccelerationContext |
Wrapper class around GPU device ID, acceleration type and GpuEngine. More... | |
class | TasGrid::CacheLagrange< T > |
Cache that holds the values of 1D Lagrange polynomials. More... | |
class | TasGrid::CacheLagrangeDerivative< T > |
Cache that holds the derivatives of 1D Lagrange polynomials. Uses the same interface as CacheLagrange. . More... | |
Functions | |
template<typename > | |
void | TasGrid::deleteHandle (int *) |
Deletes the handle, specialized for each TPL backend and tag in TasGrid::AccHandle namepace. More... | |
template<typename T > | |
void | TasGrid::TasGpu::dtrans2can (AccelerationContext const *acc, bool use01, int dims, int num_x, int pad_size, const double *gpu_trans_a, const double *gpu_trans_b, const T *gpu_x_transformed, T *gpu_x_canonical) |
Uses custom kernel to convert transformed points to canonical points, all arrays live on the CUDA device. More... | |
template<typename T > | |
void | TasGrid::TasGpu::devalpwpoly (AccelerationContext const *acc, int order, TypeOneDRule rule, int num_dimensions, int num_x, int num_basis, const T *gpu_x, const T *gpu_nodes, const T *gpu_support, T *gpu_y) |
Evaluate the basis functions for a local polynomial grid using the DENSE algorithm. More... | |
template<typename T > | |
void | TasGrid::TasGpu::devalpwpoly_sparse (AccelerationContext const *acc, int order, TypeOneDRule rule, int dims, int num_x, const T *gpu_x, const GpuVector< T > &gpu_nodes, const GpuVector< T > &gpu_support, const GpuVector< int > &gpu_hpntr, const GpuVector< int > &gpu_hindx, const GpuVector< int > &gpu_hroots, GpuVector< int > &gpu_spntr, GpuVector< int > &gpu_sindx, GpuVector< T > &gpu_svals) |
Evaluate the basis functions for a local polynomial grid using the SPARSE algorithm. More... | |
template<typename T > | |
void | TasGrid::TasGpu::devalseq (AccelerationContext const *acc, int dims, int num_x, const std::vector< int > &max_levels, const T *gpu_x, const GpuVector< int > &num_nodes, const GpuVector< int > &points, const GpuVector< T > &nodes, const GpuVector< T > &coeffs, T *gpu_result) |
Evaluate the basis for a Sequence grid. More... | |
template<typename T > | |
void | TasGrid::TasGpu::devalfor (AccelerationContext const *acc, int dims, int num_x, const std::vector< int > &max_levels, const T *gpu_x, const GpuVector< int > &num_nodes, const GpuVector< int > &points, T *gpu_wreal, typename GpuVector< T >::value_type *gpu_wimag) |
Evaluate the basis for a Fourier grid. More... | |
template<typename T > | |
void | TasGrid::TasGpu::devalglo (AccelerationContext const *acc, bool is_nested, bool is_clenshawcurtis0, int dims, int num_x, int num_p, int num_basis, T const *gpu_x, GpuVector< T > const &nodes, GpuVector< T > const &coeff, GpuVector< T > const &tensor_weights, GpuVector< int > const &nodes_per_level, GpuVector< int > const &offset_per_level, GpuVector< int > const &map_dimension, GpuVector< int > const &map_level, GpuVector< int > const &active_tensors, GpuVector< int > const &active_num_points, GpuVector< int > const &dim_offsets, GpuVector< int > const &map_tensor, GpuVector< int > const &map_index, GpuVector< int > const &map_reference, T *gpu_result) |
Evaluate the basis for Global grid. More... | |
void | TasGrid::TasGpu::fillDataGPU (AccelerationContext const *acc, double value, long long N, long long stride, double data[]) |
Fills the data with the provided real number at the given stride. | |
template<typename T > | |
void | TasGrid::TasGpu::load_n (AccelerationContext const *acc, T const *cpu_data, size_t num_entries, T *gpu_data) |
Similar to copy_n, copies the data from the CPU to the GPU. | |
template<typename T , typename U > | |
Utils::use_if<!std::is_same< U, T >::value > | TasGrid::TasGpu::load_n (AccelerationContext const *acc, U const *cpu_data, size_t num_entries, T *gpu_data) |
Similar to copy_n, copies the data from the CPU to the GPU. | |
TypeAcceleration | TasGrid::AccelerationMeta::getIOAccelerationString (const char *name) |
Convert the string (coming from C or Python) into an enumerated type. More... | |
const char * | TasGrid::AccelerationMeta::getIOAccelerationString (TypeAcceleration accel) |
Convert the enumerated type to a string, the inverse of getIOAccelerationString() More... | |
int | TasGrid::AccelerationMeta::getIOAccelerationInt (TypeAcceleration accel) |
Convert the integer (coming from Fortran) into an enumerated type. More... | |
TypeAcceleration | TasGrid::AccelerationMeta::getIOIntAcceleration (int accel) |
Convert the enumerated type to an integer, the inverse of getIOAccelerationInt() More... | |
bool | TasGrid::AccelerationMeta::isAccTypeGPU (TypeAcceleration accel) |
Returns true if accele is cuda, cublas or magma. More... | |
TypeAcceleration | TasGrid::AccelerationMeta::getAvailableFallback (TypeAcceleration accel) |
Implements fallback logic, if accel has been enabled through CMake then this returns accel, otherwise it returns the "next-best-thing". More... | |
int | TasGrid::AccelerationMeta::getNumGpuDevices () |
Return the number of visible GPU devices. More... | |
void | TasGrid::AccelerationMeta::setDefaultGpuDevice (int deviceID) |
Selects the active device for this CPU thread, not supported for DPC++. More... | |
unsigned long long | TasGrid::AccelerationMeta::getTotalGPUMemory (int deviceID) |
Return the memory available in the device (in units of bytes). More... | |
std::string | TasGrid::AccelerationMeta::getGpuDeviceName (int deviceID) |
Returns the name of the selected GPU device, empty string if no device is available or the index is out of bounds. More... | |
template<typename T > | |
void | TasGrid::AccelerationMeta::recvGpuArray (AccelerationContext const *, size_t num_entries, const T *gpu_data, std::vector< T > &cpu_data) |
Copy a device array to the main memory, used for testing only, always favor using GpuVector (if possible). More... | |
template<typename T > | |
void | TasGrid::AccelerationMeta::delGpuArray (AccelerationContext const *, T *x) |
Deallocate device array, used primarily for testing, always favor using GpuVector (if possible). More... | |
void * | TasGrid::AccelerationMeta::createCublasHandle () |
Creates a new cuBlas handle, used in unit-testing only. | |
void | TasGrid::AccelerationMeta::deleteCublasHandle (void *) |
Destroys the cuBlas handle, used in unit-testing only. | |
void TasGrid::deleteHandle | ( | int * | ) |
Deletes the handle, specialized for each TPL backend and tag in TasGrid::AccHandle namepace.
void TasGrid::TasGpu::dtrans2can | ( | AccelerationContext const * | acc, |
bool | use01, | ||
int | dims, | ||
int | num_x, | ||
int | pad_size, | ||
const double * | gpu_trans_a, | ||
const double * | gpu_trans_b, | ||
const T * | gpu_x_transformed, | ||
T * | gpu_x_canonical | ||
) |
Uses custom kernel to convert transformed points to canonical points, all arrays live on the CUDA device.
The array gpu_x_transformed is provided by the user and must be of size num_x times dims. The points are stored contiguously in strides of dim (identical to all other calls). In order to facilitate contiguous memory access, it is most efficient to assign each thread to different dimension, but the dimensions are much less than the threads. Thus, we pad the transforms gpu_trans_a and gpu_trans_b as if they are applied to much larger vectors (of dimension pad_size). The pad_size must be a multiple of dims, but num_x doesn't have to divide into pad_size. The gpu_trans_a and gpu_trans_b must have length pad_size and must contain the rate and shift of the transform (not the upper/lower limits).
The use01 indicates whether to use canonical interval (0, 1) or (-1, 1). This is called from AccelerationDomainTransform::getCanonicalPoints().
See the implementation of AccelerationDomainTransform::load() for more details.
void TasGrid::TasGpu::devalpwpoly | ( | AccelerationContext const * | acc, |
int | order, | ||
TypeOneDRule | rule, | ||
int | num_dimensions, | ||
int | num_x, | ||
int | num_basis, | ||
const T * | gpu_x, | ||
const T * | gpu_nodes, | ||
const T * | gpu_support, | ||
T * | gpu_y | ||
) |
Evaluate the basis functions for a local polynomial grid using the DENSE algorithm.
Used for basis evaluations in GridLocalPolynomial with order (0, 1, and 2) and rule. The grid has num_dimensions and num_basis functions (equal to the number of points in the grid). The number of locations to evaluate is num_x, and gpu_x must have size num_x times num_dimensions, and gpu_x must be on a canonical interval. The output matrix gpu_y has dimension num_basis (contiguous dimension) times num_x.
The grid nodes and the associated support are "encoded" in gpu_nodes and gpu_support, where negative support is used to handle special cases such as global support on level 1 (rule semi-localp). The interpretation of the negative support must be synchronized between the kernel tasgpu_devalpwpoly_feval() and GridLocalPolynomial::encodeSupportForGPU().
void TasGrid::TasGpu::devalpwpoly_sparse | ( | AccelerationContext const * | acc, |
int | order, | ||
TypeOneDRule | rule, | ||
int | dims, | ||
int | num_x, | ||
const T * | gpu_x, | ||
const GpuVector< T > & | gpu_nodes, | ||
const GpuVector< T > & | gpu_support, | ||
const GpuVector< int > & | gpu_hpntr, | ||
const GpuVector< int > & | gpu_hindx, | ||
const GpuVector< int > & | gpu_hroots, | ||
GpuVector< int > & | gpu_spntr, | ||
GpuVector< int > & | gpu_sindx, | ||
GpuVector< T > & | gpu_svals | ||
) |
Evaluate the basis functions for a local polynomial grid using the SPARSE algorithm.
The inputs are identical to devalpwpoly() with the addition of hierarchy vectors gpu_hpntr, gpu_hindx, and gpu_roots. The hierarchy vectors define a series of trees, one for each entry in gpu_roots, and all three nodes combined are indexed from 0 to gpu_spntr.size(). The gpu_hpntr holds the offsets of the children of each node and the indexes of the children are stored in gpu_sindx. The format is identical to row compressed sparse matrix.
The output vectors gpu_spntr, gpu_sindx and gpu_svals form a row compressed matrix, e.g., in format that can directly interface with Nvidia cusparseDcsrmm2().
void TasGrid::TasGpu::devalseq | ( | AccelerationContext const * | acc, |
int | dims, | ||
int | num_x, | ||
const std::vector< int > & | max_levels, | ||
const T * | gpu_x, | ||
const GpuVector< int > & | num_nodes, | ||
const GpuVector< int > & | points, | ||
const GpuVector< T > & | nodes, | ||
const GpuVector< T > & | coeffs, | ||
T * | gpu_result | ||
) |
Evaluate the basis for a Sequence grid.
The evaluation points are defined on a canonical interval and given in gpu_x with size dims times num_x. The max_levels indicates the maximum level in each direction (one more than the vector stored in the GridSequence data structure), the vector is required to compute the offsets of the intermediate cache for the Newton polynomials. The points holds the same information as the MultiIndexSet, but in transposed order, i.e., the dimensions of the multi-indexes are contiguous. The kernel handles one dimension at a time, hence the switched order compared to the CPU which handles one multi-index at a time. The ndoes vector has the cached 1D nodes and coeffs holds the cached coefficients of the Newton polynomials.
The output is gpu_result which must have dimension num_x by num_nodes.size() / dims.
void TasGrid::TasGpu::devalfor | ( | AccelerationContext const * | acc, |
int | dims, | ||
int | num_x, | ||
const std::vector< int > & | max_levels, | ||
const T * | gpu_x, | ||
const GpuVector< int > & | num_nodes, | ||
const GpuVector< int > & | points, | ||
T * | gpu_wreal, | ||
typename GpuVector< T >::value_type * | gpu_wimag | ||
) |
Evaluate the basis for a Fourier grid.
The logic is identical to devalseq(), except the Fourier polynomials do not require nodes or coefficients. The output is two real arrays of size num_x by num_nodes.size() / dims corresponding to the real and complex parts of the basis.
void TasGrid::TasGpu::devalglo | ( | AccelerationContext const * | acc, |
bool | is_nested, | ||
bool | is_clenshawcurtis0, | ||
int | dims, | ||
int | num_x, | ||
int | num_p, | ||
int | num_basis, | ||
T const * | gpu_x, | ||
GpuVector< T > const & | nodes, | ||
GpuVector< T > const & | coeff, | ||
GpuVector< T > const & | tensor_weights, | ||
GpuVector< int > const & | nodes_per_level, | ||
GpuVector< int > const & | offset_per_level, | ||
GpuVector< int > const & | map_dimension, | ||
GpuVector< int > const & | map_level, | ||
GpuVector< int > const & | active_tensors, | ||
GpuVector< int > const & | active_num_points, | ||
GpuVector< int > const & | dim_offsets, | ||
GpuVector< int > const & | map_tensor, | ||
GpuVector< int > const & | map_index, | ||
GpuVector< int > const & | map_reference, | ||
T * | gpu_result | ||
) |
Evaluate the basis for Global grid.
The logic is more complicated due to the more general nature of the grid.
TypeAcceleration TasGrid::AccelerationMeta::getIOAccelerationString | ( | const char * | name | ) |
Convert the string (coming from C or Python) into an enumerated type.
const char* TasGrid::AccelerationMeta::getIOAccelerationString | ( | TypeAcceleration | accel | ) |
Convert the enumerated type to a string, the inverse of getIOAccelerationString()
int TasGrid::AccelerationMeta::getIOAccelerationInt | ( | TypeAcceleration | accel | ) |
Convert the integer (coming from Fortran) into an enumerated type.
TypeAcceleration TasGrid::AccelerationMeta::getIOIntAcceleration | ( | int | accel | ) |
Convert the enumerated type to an integer, the inverse of getIOAccelerationInt()
bool TasGrid::AccelerationMeta::isAccTypeGPU | ( | TypeAcceleration | accel | ) |
Returns true if accele is cuda, cublas or magma.
TypeAcceleration TasGrid::AccelerationMeta::getAvailableFallback | ( | TypeAcceleration | accel | ) |
Implements fallback logic, if accel has been enabled through CMake then this returns accel, otherwise it returns the "next-best-thing".
This function always returns a valid acceleration type. The fallback logic is documented with the enumerated type TasGrid::TypeAcceleration.
int TasGrid::AccelerationMeta::getNumGpuDevices | ( | ) |
Return the number of visible GPU devices.
void TasGrid::AccelerationMeta::setDefaultGpuDevice | ( | int | deviceID | ) |
Selects the active device for this CPU thread, not supported for DPC++.
The deviceID must be a valid ID (between 0 and getNumGpuDevices() -1).
unsigned long long TasGrid::AccelerationMeta::getTotalGPUMemory | ( | int | deviceID | ) |
Return the memory available in the device (in units of bytes).
The deviceID must be a valid ID (between 0 and getNumGpuDevices() -1).
std::string TasGrid::AccelerationMeta::getGpuDeviceName | ( | int | deviceID | ) |
Returns the name of the selected GPU device, empty string if no device is available or the index is out of bounds.
The deviceID must be a valid ID (between 0 and getNumGpuDevices() -1).
void TasGrid::AccelerationMeta::recvGpuArray | ( | AccelerationContext const * | , |
size_t | num_entries, | ||
const T * | gpu_data, | ||
std::vector< T > & | cpu_data | ||
) |
Copy a device array to the main memory, used for testing only, always favor using GpuVector (if possible).
void TasGrid::AccelerationMeta::delGpuArray | ( | AccelerationContext const * | , |
T * | x | ||
) |
Deallocate device array, used primarily for testing, always favor using GpuVector (if possible).