Publications

library_booksThe Tensor Algebra Compiler
Abstract
Tensor algebra is a powerful tool with applications in machine learning, data analytics, engineering and the physical sciences. Tensors are often sparse and compound operations must frequently be computed in a single kernel for performance and to save memory. Programmers are left to write kernels for every operation of interest, with different mixes of dense and sparse tensors in different formats. The combinations are infinite, which makes it impossible to manually implement and optimize them all. This paper introduces the first compiler technique to automatically generate kernels for any compound tensor algebra operation on dense and sparse tensors. The technique is implemented in a C++ library called taco. Its performance is competitive with bestinclass handoptimized kernels in popular libraries, while supporting far more tensor operations.BibTex
@article{kjolstad:2017:taco, author = {Kjolstad, Fredrik and Kamil, Shoaib and Chou, Stephen and Lugato, David and Amarasinghe, Saman}, title = {The Tensor Algebra Compiler}, journal = {Proc. ACM Program. Lang.}, issue_date = {October 2017}, volume = {1}, number = {OOPSLA}, month = oct, year = {2017}, issn = {24751421}, pages = {77:177:29}, articleno = {77}, numpages = {29}, url = {http://doi.acm.org/10.1145/3133901}, doi = {10.1145/3133901}, acmid = {3133901}, publisher = {ACM}, address = {New York, NY, USA}, keywords = {code generation, iteration graphs, linear algebra, merge lattices, parallelism, performance, sparse data structures, tensor algebra, tensors} }

library_bookstaco: A Tool to Generate Tensor Algebra Kernels
Abstract
Tensor algebra is an important computational abstraction that is increasingly used in data analytics, machine learning, engineering, and the physical sciences. However, the number of tensor expressions is unbounded, which makes it hard to develop and optimize libraries. Furthermore, the tensors are often sparse (most components are zero), which means the code has to traverse compressed formats. To support programmers we have developed taco, a code generation tool that generates dense, sparse, and mixed kernels from tensor algebra expressions. This paper describes the taco web and commandline tools and discusses the benefits of a code generator over a traditional library. See also the demo video at tensorcompiler.org/ase2017.BibTex
@inproceedings{kjolstad:2017:tacotool, author={Kjolstad, Fredrik and Chou, Stephen and Lugato, David and Kamil, Shoaib and Amarasinghe, Saman}, booktitle={2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)}, title={taco: A Tool to Generate Tensor Algebra Kernels}, year={2017}, pages={943948}, keywords={data analysis;learning (artificial intelligence);mathematics computing;program compilers;software libraries;tensors;code generation tool;code generator;commandline tools;compressed formats;computational abstraction;data analytics;dense kernels;machine learning;mixed kernels;physical sciences;sparse kernels;taco web;tensor algebra expressions;tensor algebra kernels;tensor expressions;Indexes;Kernel;Libraries;Linear algebra;Tensile stress;Tools;Tensor algebra;compiler;linear algebra;sparse}, doi={10.1109/ASE.2017.8115709}, month={Oct} }

library_booksUnified Sparse Formats for Tensor Algebra Compilers
Abstract
This paper shows how to build a sparse tensor algebra compiler that is agnostic to tensor formats (data layouts). We develop an interface that describes formats in terms of their capabilities and properties, and show how to build a modular code generator where new formats can be added as plugins. We then describe six implementations of the interface that compose to form the dense, CSR/CSF, COO, DIA, ELL, and HASH tensor formats and countless variants thereof. With these implementations at hand, our code generator can generate code for any tensor algebra expression on any combination of the aforementioned formats.
To demonstrate our modular code generator design, we have implemented it in the opensource taco tensor algebra compiler. Our evaluation shows that we get better performance by supporting more formats specialized to different tensor structures, and our plugins makes it easy to add new formats. For example, when data is provided in the COO format, computing a single matrixvector multiplication with COO is up to 3.6x faster than with CSR. Furthermore, DIA is specialized to tensor convolutions and stencil operations and therefore performs up to 22% faster than CSR for such operations. To further demonstrate the importance of support for many formats, we show that the best vector format for matrixvector multiplication varies with input sparsities, from hash maps to sparse vectors to dense vectors. Finally, we show that the performance of generated code for these formats is competitive with handoptimized implementations.BibTex
@article{chou:2018:fmtaco, author = {Chou, Stephen and Kjolstad, Fredrik and Amarasinghe, Saman}, title = {Unified Sparse Formats for Tensor Algebra Compilers}, journal = {ArXiv eprints}, archivePrefix = "arXiv", eprint = {1804.10112}, primaryClass = "cs.MS", keywords = {Computer Science  Mathematical Software, Computer Science  Programming Languages}, year = 2018, month = apr, url = {https://arxiv.org/abs/1804.10112} }

library_booksSparse Tensor Algebra Optimizations with Workspaces
Abstract
This paper shows how to optimize sparse tensor algebraic expressions by introducing temporary tensors, called workspaces, into the resulting loop nests. We develop a new intermediate language for tensor operations called concrete index notation that extends tensor index notation. Concrete index notation expresses when and where subcomputations occur and what tensor they are stored into. We then describe the workspace optimization in this language, and how to compile it to sparse code by building on prior work in the literature.
We demonstrate the importance of the optimization on several important sparse tensor kernels, including sparse matrixmatrix multiplication (SpMM), sparse tensor addition (SpAdd), and the matricized tensor times KhatriRao product (MTTKRP) used to factorize tensors. Our results show improvements over prior work on tensor algebra compilation and brings the performance of these kernels on par with stateoftheart handoptimized implementations. For example, SpMM was not supported by prior tensor algebra compilers, the performance of MTTKRP on the nell2 data set improves by 35%, and MTTKRP can for the first time have sparse results.BibTex
@article{kjolstad:2018:workspaces, author = {Kjolstad, Fredrik and Ahrens, Peter and Kamil, Shoaib and Amarasinghe, Saman}, title = {Sparse Tensor Algebra Optimizations with Workspaces}, journal = {ArXiv eprints}, archivePrefix = "arXiv", eprint = {1802.10574}, primaryClass = "cs.MS", keywords = {Computer Science  Mathematical Software, Computer Science  Programming Languages}, year = 2018, month = apr, url = {https://arxiv.org/abs/1802.10574} }