StarPU is a runtime system that offers support for heterogeneous multicore machines. While many efforts are devoted to design efficient computation kernels for those architectures (e.g. to implement BLAS kernels on GPUs), StarPU not only takes care of offloading such kernels (and implementing data coherency across the machine), but it also makes sure the kernels are executed as efficiently as possible. For enabling HWLOC set HWLOC=yes (this requires hwloc) For enabling OpenMP set OMP=yes For enabling trace support set FXT=yes (requires libfxt) For enabling HDF5 set HDF5=yes (requires hdf5) For enabling cluster support set CLUSTER=yes Optional requirements (auto-detect) - blas - MPI (mpich or openmpi)