Tensor Contraction Code Generator

The Tensor Contraction Code Generator (TCCG) generates high-performance (parallel and) vectorized C code for tensor contractions.

From a computational perspective, tensors can be interpreted as higher dimensional matrices or simply as multidimensional arrays; likewise, tensor contractions are a generalization of the matrix-matrix multiplication to higher dimensions. For instance, A[i,k], B[k,j] and C[i,j] denote two-dimensional tensors (i.e., matrices) and C[i,j] = A[i,k] * B[k,j] represents a tensor contraction where the sum over 'k' as well as the loops over 'i' and 'j' are implicit. Further examples of tensor contractions are: C[i0,j0,j1] = A[i0,k0] * B[j1,k0,j0]; C[i0,j0,j1,i1] = A[i0,k0,i1] * B[j1,k0,j0]; C[i0,j0,j1,i1] = A[k0,i0,k1,i1] * B[k1,j1,k0,j0] ...

Current version: v0.1.2

Key Features

TCCG generates high-performance vectorized C code
TCCG generates code based on three different approaches:
- GEMM-like Tensor-Tensor Multiplication (GETT): This novel approach to tensor contractions is at the core of our latest publication (see below).
- Transpose-Transpose-GEMM-Transpose (TTGT)
- Loops-over-GEMM (LoG)
Shared-memory parallelism
- TTGT, LoG, GETT
Support for single- and double-precision
Auto-Fine-Tuning:
- Automatically explores a search space of promising implementation candidates
- The fastest candidate will be selected and returned automatically
- A performance model guides the search
- The search space can be limited by the user (via the --maxImplementations=N command line argument)
Support for multiple instruction sets:
- AVX2: GETT, TTGT, LoG
- AVX512: GETT, TTGT, LoG (experimental)
- CUDA: TTGT, LoG

Advantages of GETT

GETT's advantages are manifold: * GETT-based code is fully vectorized and exploits the cache hierarchy. * Sub-tensors are packed into the caches as needed. Thus, GETT avoids the explicit transposition overhead incurred by TTGT. * The stride-one index is preserved while packing the sub-tensors into a specified level of the cache hierarchy. * No additional workspace is required (except for small buffers which fit into the caches). * The arithmetic intensity is retained for any given tensor contraction.

While GETT exhibits excellent performance across a wide range of tensor contractions, its performance for bandwidth-bound tensor contractions is especially outstanding.

For further information, please see our (paper).

Requirements

In order to use TCCG, a working C compiler and some BLAS library (e.g., Intel's MKL) as well as the High-Perfromance Tensor Transposition library (HPTT) are required:

Intel's ICC (>= v15.0, recommended) or g++ (>= v4.8, experimental)
Some BLAS library (e.g., BLIS, ATLAS)
High-Performance Tensor Transposition (HPTT) library
Python (tested with v2.7.5 and v2.7.9)
Tensor Contraction Library TCL (OPTIONAL)

Install

Clone the repository into a desired directory and change to that location:

git clone https://github.com/HPAC/tccg.git cd tccg
Install TCCG:

python setup.py install --user
Export the TCCG_ROOT environment variable (add to your .bashrc):

export TCCG_ROOT=pwd
Setup the your BLAS library within the $TCCG_ROOT/config.cfg (default: mkl).
You might have to add the installed location to your PATH environment variable:

export PATH=$PATH:~/.local/bin

Getting Started

Please run tccg --help to get an overview of TCCG's parameters.

Here is an exemplary input file to TCCG:

C[a,b,i,j] = A[i,m,a] * B[m,j,b]
a = 24
b = 24
i = 24
j = 24
m = 24

TCCG command line arguments:

tccg --arch=avx2 --numThreads=1 --floatType=s example.tccg

Further exmaples (.tccg files) can be generated via:

python bechmark/benchmark.py

Benchmark

TCCG provides a benchmark for tensor contractions.

python benchmark.py

This will generate the input files (.tccg) for TCCG for each of the test-cases within the benchmark. The tensor contractions within the benchmark are collected from four different publications to cover a broad range of use cases (see paper, Sec. 7.1); this being said, we don't claim that this benchmark is exhaustive in any sense. If you think that the benchmark is missing certain tensor contractions or sizes, please feel free to contribute to the benchmark.

Since this benchmark may evolve over time and to make comparisons easier, please refer to the current version of the benchmark.

Benchmark version: v0.1

Current Limitations of GETT

The product of the sizes corresponding to the free indices of each input tensor needs to be a multiple of 24. This limitation will be lifted in a future version of GETT.

Citation

In case you want to refer to TCCG as part of a research paper, please cite the following article (pdf):

@article{tccg2016a,
   author      = {Paul Springer and Paolo Bientinesi},
   title       = {{Design of a high-performance GEMM-like Tensor-Tensor Multiplication}},
   archivePrefix = "arXiv",
   eprint = {1607.00145},
   primaryClass = "quant-ph",
   journal     = {CoRR},
   year        = {2016},
   issue_date  = {July 2016},
   url         = {http://arxiv.org/abs/1607.00145}
}

Changelog

V0.2.0:

GETT is now also parallelized
this branch now uses the High-Perfromance Tensor Transposition library (HPTT) which significantly reduces the compile time

Feedback & Contributions

We are happy for any feedback or feature requests. Please contact [email protected].

We also welcome any contributions to the code base or the benchmark.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
benchmark		benchmark
scripts		scripts
tccg		tccg
.gitignore		.gitignore
README.md		README.md
config.cfg		config.cfg
example.tccg		example.tccg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tensor Contraction Code Generator

Key Features

Advantages of GETT

Requirements

Install

Getting Started

Benchmark

Current Limitations of GETT

Citation

Changelog

Feedback & Contributions

About

Releases

Packages

Languages

HPAC/tccg

Folders and files

Latest commit

History

Repository files navigation

Tensor Contraction Code Generator

Key Features

Advantages of GETT

Requirements

Install

Getting Started

Benchmark

Current Limitations of GETT

Citation

Changelog

Feedback & Contributions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages