Publications

library_booksThe Tensor Algebra Compiler
Abstract
Tensor algebra is a powerful tool with applications in machine learning, data analytics, engineering and the physical sciences. Tensors are often sparse and compound operations must frequently be computed in a single kernel for performance and to save memory. Programmers are left to write kernels for every operation of interest, with different mixes of dense and sparse tensors in different formats. The combinations are infinite, which makes it impossible to manually implement and optimize them all. This paper introduces the first compiler technique to automatically generate kernels for any compound tensor algebra operation on dense and sparse tensors. The technique is implemented in a C++ library called taco. Its performance is competitive with bestinclass handoptimized kernels in popular libraries, while supporting far more tensor operations.BibTex
@article{kjolstad:2017:taco, author = {Kjolstad, Fredrik and Kamil, Shoaib and Chou, Stephen and Lugato, David and Amarasinghe, Saman}, title = {The Tensor Algebra Compiler}, journal = {Proc. ACM Program. Lang.}, issue_date = {October 2017}, volume = {1}, number = {OOPSLA}, month = oct, year = {2017}, issn = {24751421}, pages = {77:177:29}, articleno = {77}, numpages = {29}, url = {http://doi.acm.org/10.1145/3133901}, doi = {10.1145/3133901}, acmid = {3133901}, publisher = {ACM}, address = {New York, NY, USA}, keywords = {code generation, iteration graphs, linear algebra, merge lattices, parallelism, performance, sparse data structures, tensor algebra, tensors} }

library_bookstaco: A Tool to Generate Tensor Algebra Kernels
Abstract
Tensor algebra is an important computational abstraction that is increasingly used in data analytics, machine learning, engineering, and the physical sciences. However, the number of tensor expressions is unbounded, which makes it hard to develop and optimize libraries. Furthermore, the tensors are often sparse (most components are zero), which means the code has to traverse compressed formats. To support programmers we have developed taco, a code generation tool that generates dense, sparse, and mixed kernels from tensor algebra expressions. This paper describes the taco web and commandline tools and discusses the benefits of a code generator over a traditional library. See also the demo video at tensorcompiler.org/ase2017.BibTex
@inproceedings{kjolstad:2017:tacotool, author={Kjolstad, Fredrik and Chou, Stephen and Lugato, David and Kamil, Shoaib and Amarasinghe, Saman}, booktitle={2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)}, title={taco: A Tool to Generate Tensor Algebra Kernels}, year={2017}, pages={943948}, keywords={data analysis;learning (artificial intelligence);mathematics computing;program compilers;software libraries;tensors;code generation tool;code generator;commandline tools;compressed formats;computational abstraction;data analytics;dense kernels;machine learning;mixed kernels;physical sciences;sparse kernels;taco web;tensor algebra expressions;tensor algebra kernels;tensor expressions;Indexes;Kernel;Libraries;Linear algebra;Tensile stress;Tools;Tensor algebra;compiler;linear algebra;sparse}, doi={10.1109/ASE.2017.8115709}, month={Oct} }

library_booksFormat Abstraction for Sparse Tensor Algebra Compilers
Abstract
This paper shows how to build a sparse tensor algebra compiler that is agnostic to tensor formats (data layouts). We develop an interface that describes formats in terms of their capabilities and properties, and show how to build a modular code generator where new formats can be added as plugins. We then describe six implementations of the interface that compose to form the dense, CSR/CSF, COO, DIA, ELL, and HASH tensor formats and countless variants thereof. With these implementations at hand, our code generator can generate code to compute any tensor algebra expression on any combination of the aforementioned formats.
To demonstrate our technique, we have implemented it in the taco tensor algebra compiler. Our modular code generator design makes it simple to add support for new tensor formats, and the performance of the generated code is competitive with handoptimized implementations. Furthermore, by extending taco to support a wider range of formats specialized for different application and data characteristics, we can improve enduser application performance. For example, if input data is provided in the COO format, our technique allows computing a single matrixvector multiplication directly with the data in COO, which is up to 3.6× faster than by first converting the data to CSR.BibTex
@article{chou:2018:formats, author = {Chou, Stephen and Kjolstad, Fredrik and Amarasinghe, Saman}, title = {Format Abstraction for Sparse Tensor Algebra Compilers}, journal = {Proc. ACM Program. Lang.}, issue_date = {November 2018}, volume = {2}, number = {OOPSLA}, month = oct, year = {2018}, issn = {24751421}, pages = {123:1123:30}, articleno = {123}, numpages = {30}, url = {http://doi.acm.org/10.1145/3276493}, doi = {10.1145/3276493}, acmid = {3276493}, publisher = {ACM}, address = {New York, NY, USA}, keywords = {modular code generation, sparse tensor algebra compilation, tensor formats}, }

library_booksSparse Tensor Algebra Optimizations with Workspaces
Abstract
This paper shows how to optimize sparse tensor algebraic expressions by introducing temporary tensors, called workspaces, into the resulting loop nests. We develop a new intermediate language for tensor operations called concrete index notation that extends tensor index notation. Concrete index notation expresses when and where subcomputations occur and what tensor they are stored into. We then describe the workspace optimization in this language, and how to compile it to sparse code by building on prior work in the literature.
We demonstrate the importance of the optimization on several important sparse tensor kernels, including sparse matrixmatrix multiplication (SpMM), sparse tensor addition (SpAdd), and the matricized tensor times KhatriRao product (MTTKRP) used to factorize tensors. Our results show improvements over prior work on tensor algebra compilation and brings the performance of these kernels on par with stateoftheart handoptimized implementations. For example, SpMM was not supported by prior tensor algebra compilers, the performance of MTTKRP on the nell2 data set improves by 35%, and MTTKRP can for the first time have sparse results.BibTex
@article{kjolstad:2018:workspaces, author = {Kjolstad, Fredrik and Ahrens, Peter and Kamil, Shoaib and Amarasinghe, Saman}, title = {Sparse Tensor Algebra Optimizations with Workspaces}, journal = {ArXiv eprints}, archivePrefix = "arXiv", eprint = {1802.10574}, primaryClass = "cs.MS", keywords = {Computer Science  Mathematical Software, Computer Science  Programming Languages}, year = 2018, month = apr, url = {https://arxiv.org/abs/1802.10574} }