|
High-Performance Tensor Transposition (HPTT) C++ Library
A C++ library for high-performance multi-threaded tensor transpositions.
|
HPTT supports tensor transpositions of the general form:
\f[ \mathcal{B}_{\pi(i_0,i_1,...,i_{d-1})} \gets \alpha * \mathcal{A}_{i_0,i_1,...,i_{d-1}} + \beta * \mathcal{B}_{\pi(i_0,i_1,...,i_{d-1})}, \f]
where
and
are scalars and
and
are d-dimensional tensors (i.e., multi-dimensional arrays).
HPTT assumes a column-major data layout, thus indices are stored from left to right (e.g.,
is the stride-1 index in
).
You must have a working C++ compiler with c++11 support. I have tested HPTT with:
Clone the repository into a desired directory and change to that location:
git clone https://github.com/springer13/hptt.git cd hptt export CXX=<desired compiler>
Now you have several options to build the desired version of the library:
make avx make arm make scalar
This should create 'libhptt.so' inside the ./lib folder.
In general HPTT is used as follows:
#include <hptt.h>
// allocate tensors
float A* = ...
float B* = ...
// specify permutation and size
int dim = 6;
int perm[dim] = {5,2,0,4,1,3};
int size[dim] = {48,28,48,28,28};
// create a plan (shared_ptr)
auto plan = hptt::create_plan( perm, dim,
alpha, A, size, NULL,
beta, B, NULL,
hptt::ESTIMATE, numThreads);
// execute the transposition
plan->execute();
The example above does not use any auto-tuning, but solely relies on HPTT's performance model. To active auto-tuning, please use hptt::MEASURE, or hptt::PATIENT instead of hptt::ESTIMATE.
Please refer to the hptt::Transpose class for additional information or to hptt::create_plan().
An extensive example is provided here: ./benchmark/benchmark.cpp.
The benchmark is the same as the original TTC benchmark benchmark for tensor transpositions.
You can compile the benchmark via:
cd benchmark make
Before running the benchmark, please modify the number of threads and the thread affinity within the benchmark.sh file. To run the benchmark just use:
./benshmark.sh
This will create hptt_benchmark.dat file containing all the runtime information of HPTT and the reference implementation.
In case you want refer to HPTT as part of a research paper, please cite the following article (pdf):