High-Performance Tensor Transposition (HPTT) C++ Library
A C++ library for high-performance multi-threaded tensor transpositions.
hptt::Transpose< floatType > Class Template Reference

The Transpose class encodes all information related to the execution of the tensor transposition. More...

#include <transpose.h>

Public Member Functions

 Transpose (const int *sizeA, const int *perm, const int *outerSizeA, const int *outerSizeB, const int dim, const floatType *A, const floatType alpha, floatType *B, const floatType beta, const SelectionMethod selectionMethod, const int numThreads, const int *threadIds=nullptr, const bool useRowMajor=false)
 
 Transpose (const Transpose &other)
 
bool getConjA () noexcept
 
void setConjA (bool conjA) noexcept
 
int getNumThreads () const noexcept
 
void setNumThreads (int numThreads) noexcept
 
floatType getAlpha () const noexcept
 
floatType getBeta () const noexcept
 
void setAlpha (floatType alpha) noexcept
 set the scaling factor for A
 
void setBeta (floatType beta) noexcept
 set the scaling factor for B
 
void setInputPtr (const floatType *A) noexcept
 Set the pointer for A. More...
 
void setOutputPtr (floatType *B) noexcept
 Set the pointer for B. More...
 
const floatType * getInputPtr () const noexcept
 Get raw-data pointer to A.
 
floatType * getOutputPtr () const noexcept
 Get raw-data pointer to B.
 
void resetThreadIds () noexcept
 Clears the array that stores the OpenMP threadIds. This function should only be used in conjuction with addThreadId().
 
void setMaxAutotuningCandidates (int num)
 
void addThreadId (int threadId) noexcept
 
void printThreadIds () const noexcept
 
int getMasterThreadId () const noexcept
 
void createPlan ()
 Creates the plan that encodes the execution of the tensor transposition.
 
template<bool useStreamingStores = true, bool spawnThreads = true, bool betaIsZero>
void execute_expert () noexcept
 
void execute () noexcept
 
void print () noexcept
 

Detailed Description

template<typename floatType>
class hptt::Transpose< floatType >

The Transpose class encodes all information related to the execution of the tensor transposition.

Once a transpose (henceforth referred to as plan) t has been created it can be executed via t->execute(). Moreover, a plan can be reused multiple times. For this purpose you might want to have a look at the functions:

In addition to the normal execute() function, this class also offers the execute_expert() interface. This interface is intended for the expert user and offers more flexibility than execute(). If you want to use the expert interface, then you might want to checkout the following functions as well:

Constructor & Destructor Documentation

◆ Transpose()

template<typename floatType >
hptt::Transpose< floatType >::Transpose ( const int *  sizeA,
const int *  perm,
const int *  outerSizeA,
const int *  outerSizeB,
const int  dim,
const floatType *  A,
const floatType  alpha,
floatType *  B,
const floatType  beta,
const SelectionMethod  selectionMethod,
const int  numThreads,
const int *  threadIds = nullptr,
const bool  useRowMajor = false 
)
Parameters
[in]permdim-dimensional array representing the permutation of the indices.
  • For instance, perm[] = {1,0,2} denotes the following transposition: $B_{i1,i0,i2} \gets A_{i0,i1,i2}$.
[in]dimDimensionality of the tensors
[in]alphascaling factor for A
[in]APointer to the raw-data of the input tensor A
[in]sizeAdim-dimensional array that stores the sizes of each dimension of A
[in]outerSizeAdim-dimensional array that stores the outer-sizes of each dimension of A.
  • This parameter may be NULL, indicating that the outer-size is equal to sizeA.
  • If outerSizeA is not NULL, outerSizeA[i] >= sizeA[i] for all 0 <= i < dim must hold.
  • This option enables HPTT to operate on sub-tensors.
[in]betascaling factor for B
[in,out]BPointer to the raw-data of the output tensor B
[in]outerSizeBdim-dimensional array that stores the outer-sizes of each dimension of B.
  • This parameter may be NULL, indicating that the outer-size is equal to the perm(sizeA).
  • If outerSizeA is not NULL, outerSizeB[i] >= perm(sizeA)[i] for all 0 <= i < dim must hold.
  • This option enables HPTT to operate on sub-tensors.
[in]selectionMethodDetermines if auto-tuning should be used. See hptt::SelectionMethod for details. ATTENTION: If you enable auto-tuning (e.g., hptt::MEASURE) then the output data will be used during the auto-tuning process. The original data (i.e., A and B), however, is preserved after this function call completes – unless your input data (i.e. A) has invalid data (e.g., NaN, inf).
[in]numThreadsnumber of threads that participate in this tensor transposition.
[in]threadIdsArray of OpenMP threadIds that participate in this tensor transposition. This parameter is only important if you want to call HPTT from within a parallel region (i.e., via execute_expert()).
[in]useRowMajorThis flag indicates whether a row-major memory layout should be used (default: off = column-major). Column-Major: indices are stored from left to right (leftmost = stride-1 index) Row-Major: indices are stored from right to left (right = stride-1 index)

Member Function Documentation

◆ addThreadId()

template<typename floatType >
void hptt::Transpose< floatType >::addThreadId ( int  threadId)
inlinenoexcept

This thread-safe function adds an OpenMP threadId to the set of threads that will participate in this tensor transposition. This function is only required in conjunction with the execute_expert() interface where the transposition is executed from within a parallel region (i.e.,~HPTT does not spawn the threads). It is the programmers responsibility to specify the correct thread IDs that participate in this call.

Parameters
[in]threadIdAn OpenMP threadId

◆ execute()

template<typename floatType >
void hptt::Transpose< floatType >::execute ( )
noexcept

Executes the transposition. This functions requires that the plan has already been created via the createPlan() function.

◆ execute_expert()

template<typename floatType >
template<bool useStreamingStores = true, bool spawnThreads = true, bool betaIsZero>
void hptt::Transpose< floatType >::execute_expert ( )
noexcept

Executes the transposition. This functions requires that the plan has already been created via the createPlan() function. This function behaves similarly to the execute() function but it offers additional template parameters to improve performance for very small tensor transpositions. Moreover it adds more flexibility.

Parameters
[in]useStreamingStoresIff this variable is set, HPTT will use streaming stores which improves performance because they avoid the write-allocate traffic incurred by the write to B. However, sometimes the user might want to avoid streaming stores because the packed data fits int cache and is reused shortly (e.g., within BLAS packing routines).
[in]spawnThreadsIf the variable is set, the threads will be spawned from within this call, otherwise it is expected that this function call executes from within a parallel region.
[in]betaIsZeroOnly set this variable if beta is zero.

◆ setInputPtr()

template<typename floatType >
void hptt::Transpose< floatType >::setInputPtr ( const floatType *  A)
inlinenoexcept

Set the pointer for A.

This features is especially useful if one wants to reuse the transposition over multiple invocations.

◆ setMaxAutotuningCandidates()

template<typename floatType >
void hptt::Transpose< floatType >::setMaxAutotuningCandidates ( int  num)
inline

setMaxAutotuningCandidates() enables users to specify the number of candidates that should be tested during the autotuning phase

◆ setOutputPtr()

template<typename floatType >
void hptt::Transpose< floatType >::setOutputPtr ( floatType *  B)
inlinenoexcept

Set the pointer for B.

This features is especially useful if one wants to reuse the transposition over multiple invocations.


The documentation for this class was generated from the following file: