The Transpose class encodes all information related to the execution of the tensor transposition.
More...
#include <transpose.h>
|
| | Transpose (const int *sizeA, const int *perm, const int *outerSizeA, const int *outerSizeB, const int dim, const floatType *A, const floatType alpha, floatType *B, const floatType beta, const SelectionMethod selectionMethod, const int numThreads, const int *threadIds=nullptr, const bool useRowMajor=false) |
| |
|
| Transpose (const Transpose &other) |
| |
|
bool | getConjA () noexcept |
| |
|
void | setConjA (bool conjA) noexcept |
| |
|
int | getNumThreads () const noexcept |
| |
|
void | setNumThreads (int numThreads) noexcept |
| |
|
floatType | getAlpha () const noexcept |
| |
|
floatType | getBeta () const noexcept |
| |
|
void | setAlpha (floatType alpha) noexcept |
| | set the scaling factor for A
|
| |
|
void | setBeta (floatType beta) noexcept |
| | set the scaling factor for B
|
| |
| void | setInputPtr (const floatType *A) noexcept |
| | Set the pointer for A. More...
|
| |
| void | setOutputPtr (floatType *B) noexcept |
| | Set the pointer for B. More...
|
| |
|
const floatType * | getInputPtr () const noexcept |
| | Get raw-data pointer to A.
|
| |
|
floatType * | getOutputPtr () const noexcept |
| | Get raw-data pointer to B.
|
| |
|
void | resetThreadIds () noexcept |
| | Clears the array that stores the OpenMP threadIds. This function should only be used in conjuction with addThreadId().
|
| |
| void | setMaxAutotuningCandidates (int num) |
| |
| void | addThreadId (int threadId) noexcept |
| |
|
void | printThreadIds () const noexcept |
| |
|
int | getMasterThreadId () const noexcept |
| |
|
void | createPlan () |
| | Creates the plan that encodes the execution of the tensor transposition.
|
| |
| template<bool useStreamingStores = true, bool spawnThreads = true, bool betaIsZero> |
| void | execute_expert () noexcept |
| |
| void | execute () noexcept |
| |
|
void | print () noexcept |
| |
template<typename floatType>
class hptt::Transpose< floatType >
The Transpose class encodes all information related to the execution of the tensor transposition.
Once a transpose (henceforth referred to as plan) t has been created it can be executed via t->execute(). Moreover, a plan can be reused multiple times. For this purpose you might want to have a look at the functions:
In addition to the normal execute() function, this class also offers the execute_expert() interface. This interface is intended for the expert user and offers more flexibility than execute(). If you want to use the expert interface, then you might want to checkout the following functions as well:
◆ Transpose()
template<typename floatType >
| hptt::Transpose< floatType >::Transpose |
( |
const int * |
sizeA, |
|
|
const int * |
perm, |
|
|
const int * |
outerSizeA, |
|
|
const int * |
outerSizeB, |
|
|
const int |
dim, |
|
|
const floatType * |
A, |
|
|
const floatType |
alpha, |
|
|
floatType * |
B, |
|
|
const floatType |
beta, |
|
|
const SelectionMethod |
selectionMethod, |
|
|
const int |
numThreads, |
|
|
const int * |
threadIds = nullptr, |
|
|
const bool |
useRowMajor = false |
|
) |
| |
- Parameters
-
| [in] | perm | dim-dimensional array representing the permutation of the indices.
- For instance, perm[] = {1,0,2} denotes the following transposition:
.
|
| [in] | dim | Dimensionality of the tensors |
| [in] | alpha | scaling factor for A |
| [in] | A | Pointer to the raw-data of the input tensor A |
| [in] | sizeA | dim-dimensional array that stores the sizes of each dimension of A |
| [in] | outerSizeA | dim-dimensional array that stores the outer-sizes of each dimension of A.
- This parameter may be NULL, indicating that the outer-size is equal to sizeA.
- If outerSizeA is not NULL, outerSizeA[i] >= sizeA[i] for all 0 <= i < dim must hold.
- This option enables HPTT to operate on sub-tensors.
|
| [in] | beta | scaling factor for B |
| [in,out] | B | Pointer to the raw-data of the output tensor B |
| [in] | outerSizeB | dim-dimensional array that stores the outer-sizes of each dimension of B.
- This parameter may be NULL, indicating that the outer-size is equal to the perm(sizeA).
- If outerSizeA is not NULL, outerSizeB[i] >= perm(sizeA)[i] for all 0 <= i < dim must hold.
- This option enables HPTT to operate on sub-tensors.
|
| [in] | selectionMethod | Determines if auto-tuning should be used. See hptt::SelectionMethod for details. ATTENTION: If you enable auto-tuning (e.g., hptt::MEASURE) then the output data will be used during the auto-tuning process. The original data (i.e., A and B), however, is preserved after this function call completes – unless your input data (i.e. A) has invalid data (e.g., NaN, inf). |
| [in] | numThreads | number of threads that participate in this tensor transposition. |
| [in] | threadIds | Array of OpenMP threadIds that participate in this tensor transposition. This parameter is only important if you want to call HPTT from within a parallel region (i.e., via execute_expert()). |
| [in] | useRowMajor | This flag indicates whether a row-major memory layout should be used (default: off = column-major). Column-Major: indices are stored from left to right (leftmost = stride-1 index) Row-Major: indices are stored from right to left (right = stride-1 index) |
◆ addThreadId()
template<typename floatType >
This thread-safe function adds an OpenMP threadId to the set of threads that will participate in this tensor transposition. This function is only required in conjunction with the execute_expert() interface where the transposition is executed from within a parallel region (i.e.,~HPTT does not spawn the threads). It is the programmers responsibility to specify the correct thread IDs that participate in this call.
- Parameters
-
| [in] | threadId | An OpenMP threadId |
◆ execute()
template<typename floatType >
Executes the transposition. This functions requires that the plan has already been created via the createPlan() function.
◆ execute_expert()
template<typename floatType >
template<bool useStreamingStores = true, bool spawnThreads = true, bool betaIsZero>
Executes the transposition. This functions requires that the plan has already been created via the createPlan() function. This function behaves similarly to the execute() function but it offers additional template parameters to improve performance for very small tensor transpositions. Moreover it adds more flexibility.
- Parameters
-
| [in] | useStreamingStores | Iff this variable is set, HPTT will use streaming stores which improves performance because they avoid the write-allocate traffic incurred by the write to B. However, sometimes the user might want to avoid streaming stores because the packed data fits int cache and is reused shortly (e.g., within BLAS packing routines). |
| [in] | spawnThreads | If the variable is set, the threads will be spawned from within this call, otherwise it is expected that this function call executes from within a parallel region. |
| [in] | betaIsZero | Only set this variable if beta is zero. |
◆ setInputPtr()
template<typename floatType >
Set the pointer for A.
This features is especially useful if one wants to reuse the transposition over multiple invocations.
◆ setMaxAutotuningCandidates()
template<typename floatType >
◆ setOutputPtr()
template<typename floatType >
Set the pointer for B.
This features is especially useful if one wants to reuse the transposition over multiple invocations.
The documentation for this class was generated from the following file: