|
High-Performance Tensor Transposition (HPTT) C++ Library
A C++ library for high-performance multi-threaded tensor transpositions.
|
Classes | |
| class | ComputeNode |
| A ComputNode encodes a loop. | |
| class | Plan |
| A plan encodes the execution of a tensor transposition. | |
| class | Transpose |
| The Transpose class encodes all information related to the execution of the tensor transposition. More... | |
Typedefs | |
| using | FloatComplex = std::complex< float > |
| using | DoubleComplex = std::complex< double > |
Enumerations | |
| enum | SelectionMethod { ESTIMATE , MEASURE , PATIENT , CRAZY } |
| Determines the duration of the auto-tuning process. More... | |
Functions | |
| std::shared_ptr< hptt::Transpose< float > > | create_plan (const int *perm, const int dim, const float alpha, const float *A, const int *sizeA, const int *outerSizeA, const float beta, float *B, const int *outerSizeB, const SelectionMethod selectionMethod, const int numThreads, const int *threadIds=nullptr, const bool useRowMajor=false) |
| Creates a Tensor Transposition plan. More... | |
| std::shared_ptr< hptt::Transpose< double > > | create_plan (const int *perm, const int dim, const double alpha, const double *A, const int *sizeA, const int *outerSizeA, const double beta, double *B, const int *outerSizeB, const SelectionMethod selectionMethod, const int numThreads, const int *threadIds=nullptr, const bool useRowMajor=false) |
| std::shared_ptr< hptt::Transpose< FloatComplex > > | create_plan (const int *perm, const int dim, const FloatComplex alpha, const FloatComplex *A, const int *sizeA, const int *outerSizeA, const FloatComplex beta, FloatComplex *B, const int *outerSizeB, const SelectionMethod selectionMethod, const int numThreads, const int *threadIds=nullptr, const bool useRowMajor=false) |
| std::shared_ptr< hptt::Transpose< DoubleComplex > > | create_plan (const int *perm, const int dim, const DoubleComplex alpha, const DoubleComplex *A, const int *sizeA, const int *outerSizeA, const DoubleComplex beta, DoubleComplex *B, const int *outerSizeB, const SelectionMethod selectionMethod, const int numThreads, const int *threadIds=nullptr, const bool useRowMajor=false) |
| std::shared_ptr< hptt::Transpose< float > > | create_plan (const std::vector< int > &perm, const int dim, const float alpha, const float *A, const std::vector< int > &sizeA, const std::vector< int > &outerSizeA, const float beta, float *B, const std::vector< int > &outerSizeB, const SelectionMethod selectionMethod, const int numThreads, const std::vector< int > &threadIds={}, const bool useRowMajor=false) |
| std::shared_ptr< hptt::Transpose< double > > | create_plan (const std::vector< int > &perm, const int dim, const double alpha, const double *A, const std::vector< int > &sizeA, const std::vector< int > &outerSizeA, const double beta, double *B, const std::vector< int > &outerSizeB, const SelectionMethod selectionMethod, const int numThreads, const std::vector< int > &threadIds={}, const bool useRowMajor=false) |
| std::shared_ptr< hptt::Transpose< FloatComplex > > | create_plan (const std::vector< int > &perm, const int dim, const FloatComplex alpha, const FloatComplex *A, const std::vector< int > &sizeA, const std::vector< int > &outerSizeA, const FloatComplex beta, FloatComplex *B, const std::vector< int > &outerSizeB, const SelectionMethod selectionMethod, const int numThreads, const std::vector< int > &threadIds={}, const bool useRowMajor=false) |
| std::shared_ptr< hptt::Transpose< DoubleComplex > > | create_plan (const std::vector< int > &perm, const int dim, const DoubleComplex alpha, const DoubleComplex *A, const std::vector< int > &sizeA, const std::vector< int > &outerSizeA, const DoubleComplex beta, DoubleComplex *B, const std::vector< int > &outerSizeB, const SelectionMethod selectionMethod, const int numThreads, const std::vector< int > &threadIds={}, const bool useRowMajor=false) |
| std::shared_ptr< hptt::Transpose< float > > | create_plan (const int *perm, const int dim, const float alpha, const float *A, const int *sizeA, const int *outerSizeA, const float beta, float *B, const int *outerSizeB, const int maxAutotuningCandidates, const int numThreads, const int *threadIds=nullptr, const bool useRowMajor=false) |
| std::shared_ptr< hptt::Transpose< double > > | create_plan (const int *perm, const int dim, const double alpha, const double *A, const int *sizeA, const int *outerSizeA, const double beta, double *B, const int *outerSizeB, const int maxAutotuningCandidates, const int numThreads, const int *threadIds=nullptr, const bool useRowMajor=false) |
| std::shared_ptr< hptt::Transpose< FloatComplex > > | create_plan (const int *perm, const int dim, const FloatComplex alpha, const FloatComplex *A, const int *sizeA, const int *outerSizeA, const FloatComplex beta, FloatComplex *B, const int *outerSizeB, const int maxAutotuningCandidates, const int numThreads, const int *threadIds=nullptr, const bool useRowMajor=false) |
| std::shared_ptr< hptt::Transpose< DoubleComplex > > | create_plan (const int *perm, const int dim, const DoubleComplex alpha, const DoubleComplex *A, const int *sizeA, const int *outerSizeA, const DoubleComplex beta, DoubleComplex *B, const int *outerSizeB, const int maxAutotuningCandidates, const int numThreads, const int *threadIds=nullptr, const bool useRowMajor=false) |
| template<> | |
| float | conj (float x) |
| template<> | |
| double | conj (double x) |
| template<> | |
| double | getZeroThreshold< double > () |
| template<> | |
| double | getZeroThreshold< DoubleComplex > () |
| template<> | |
| double | getZeroThreshold< float > () |
| template<> | |
| double | getZeroThreshold< FloatComplex > () |
| void | trashCache (double *A, double *B, int n) |
| template<typename t > | |
| int | hasItem (const std::vector< t > &vec, t value) |
| template<typename t > | |
| void | printVector (const std::vector< t > &vec, const char *label) |
| template<typename t > | |
| void | printVector (const std::list< t > &vec, const char *label) |
| void | getPrimeFactors (int n, std::list< int > &primeFactors) |
| template<typename t > | |
| int | findPos (t value, const std::vector< t > &array) |
| int | findPos (int value, const int *array, int n) |
| int | factorial (int n) |
| void | accountForRowMajor (const int *sizeA, const int *outerSizeA, const int *outerSizeB, const int *perm, int *tmpSizeA, int *tmpOuterSizeA, int *tmpouterSizeB, int *tmpPerm, const int dim, const bool useRowMajor) |
Determines the duration of the auto-tuning process.
| std::shared_ptr< hptt::Transpose< float > > hptt::create_plan | ( | const int * | perm, |
| const int | dim, | ||
| const float | alpha, | ||
| const float * | A, | ||
| const int * | sizeA, | ||
| const int * | outerSizeA, | ||
| const float | beta, | ||
| float * | B, | ||
| const int * | outerSizeB, | ||
| const SelectionMethod | selectionMethod, | ||
| const int | numThreads, | ||
| const int * | threadIds = nullptr, |
||
| const bool | useRowMajor = false |
||
| ) |
Creates a Tensor Transposition plan.
A tensor transposition plan is a data structure that encodes the execution of the tensor transposition. HPTT supports tensor transpositions of the form:
The plan can be reused over several transpositions.
| [in] | perm | dim-dimensional array representing the permutation of the indices.
|
| [in] | dim | Dimensionality of the tensors |
| [in] | alpha | scaling factor for A |
| [in] | A | Pointer to the raw-data of the input tensor A |
| [in] | sizeA | dim-dimensional array that stores the sizes of each dimension of A |
| [in] | outerSizeA | dim-dimensional array that stores the outer-sizes of each dimension of A.
|
| [in] | beta | scaling factor for B |
| [in,out] | B | Pointer to the raw-data of the output tensor B |
| [in] | outerSizeB | dim-dimensional array that stores the outer-sizes of each dimension of B.
|
| [in] | selectionMethod | Determines if auto-tuning should be used. See hptt::SelectionMethod for details. ATTENTION: If you enable auto-tuning (e.g., hptt::MEASURE) then the output data will be used during the auto-tuning process. The original data (i.e., A and B), however, is preserved after this function call completes – unless your input data (i.e. A) has invalid data (e.g., NaN, inf). |
| [in] | numThreads | number of threads that participate in this tensor transposition. |
| [in] | threadIds | Array of OpenMP threadIds that participate in this tensor transposition. This parameter is only important if you want to call HPTT from within a parallel region (i.e., via execute_expert()). |
| [in] | useRowMajor | This flag indicates whether a row-major memory layout should be used (default: off = column-major). |