Publications

library_booksThe Tensor Algebra Compiler
Abstract
Tensor algebra is a powerful tool with applications in machine learning, data analytics, engineering and the physical sciences. Tensors are often sparse and compound operations must frequently be computed in a single kernel for performance and to save memory. Programmers are left to write kernels for every operation of interest, with different mixes of dense and sparse tensors in different formats. The combinations are infinite, which makes it impossible to manually implement and optimize them all. This paper introduces the first compiler technique to automatically generate kernels for any compound tensor algebra operation on dense and sparse tensors. The technique is implemented in a C++ library called taco. Its performance is competitive with bestinclass handoptimized kernels in popular libraries, while supporting far more tensor operations.BibTex
@article{kjolstad:2017:taco, author = {Kjolstad, Fredrik and Kamil, Shoaib and Chou, Stephen and Lugato, David and Amarasinghe, Saman}, title = {The Tensor Algebra Compiler}, journal = {Proc. ACM Program. Lang.}, issue_date = {October 2017}, volume = {1}, number = {OOPSLA}, month = oct, year = {2017}, issn = {24751421}, pages = {77:177:29}, articleno = {77}, numpages = {29}, url = {http://doi.acm.org/10.1145/3133901}, doi = {10.1145/3133901}, acmid = {3133901}, publisher = {ACM}, address = {New York, NY, USA}, keywords = {code generation, iteration graphs, linear algebra, merge lattices, parallelism, performance, sparse data structures, tensor algebra, tensors} }

library_bookstaco: A Tool to Generate Tensor Algebra Kernels
Abstract
Tensor algebra is an important computational abstraction that is increasingly used in data analytics, machine learning, engineering, and the physical sciences. However, the number of tensor expressions is unbounded, which makes it hard to develop and optimize libraries. Furthermore, the tensors are often sparse (most components are zero), which means the code has to traverse compressed formats. To support programmers we have developed taco, a code generation tool that generates dense, sparse, and mixed kernels from tensor algebra expressions. This paper describes the taco web and commandline tools and discusses the benefits of a code generator over a traditional library. See also the demo video at tensorcompiler.org/ase2017.BibTex
@inproceedings{kjolstad:2017:tacotool, author={Kjolstad, Fredrik and Chou, Stephen and Lugato, David and Kamil, Shoaib and Amarasinghe, Saman}, booktitle={2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)}, title={taco: A Tool to Generate Tensor Algebra Kernels}, year={2017}, pages={943948}, keywords={data analysis;learning (artificial intelligence);mathematics computing;program compilers;software libraries;tensors;code generation tool;code generator;commandline tools;compressed formats;computational abstraction;data analytics;dense kernels;machine learning;mixed kernels;physical sciences;sparse kernels;taco web;tensor algebra expressions;tensor algebra kernels;tensor expressions;Indexes;Kernel;Libraries;Linear algebra;Tensile stress;Tools;Tensor algebra;compiler;linear algebra;sparse}, doi={10.1109/ASE.2017.8115709}, month={Oct} }

library_booksFormat Abstraction for Sparse Tensor Algebra Compilers
Abstract
This paper shows how to build a sparse tensor algebra compiler that is agnostic to tensor formats (data layouts). We develop an interface that describes formats in terms of their capabilities and properties, and show how to build a modular code generator where new formats can be added as plugins. We then describe six implementations of the interface that compose to form the dense, CSR/CSF, COO, DIA, ELL, and HASH tensor formats and countless variants thereof. With these implementations at hand, our code generator can generate code to compute any tensor algebra expression on any combination of the aforementioned formats.
To demonstrate our technique, we have implemented it in the taco tensor algebra compiler. Our modular code generator design makes it simple to add support for new tensor formats, and the performance of the generated code is competitive with handoptimized implementations. Furthermore, by extending taco to support a wider range of formats specialized for different application and data characteristics, we can improve enduser application performance. For example, if input data is provided in the COO format, our technique allows computing a single matrixvector multiplication directly with the data in COO, which is up to 3.6× faster than by first converting the data to CSR.BibTex
@article{chou:2018:formats, author = {Chou, Stephen and Kjolstad, Fredrik and Amarasinghe, Saman}, title = {Format Abstraction for Sparse Tensor Algebra Compilers}, journal = {Proc. ACM Program. Lang.}, issue_date = {November 2018}, volume = {2}, number = {OOPSLA}, month = oct, year = {2018}, issn = {24751421}, pages = {123:1123:30}, articleno = {123}, numpages = {30}, url = {http://doi.acm.org/10.1145/3276493}, doi = {10.1145/3276493}, acmid = {3276493}, publisher = {ACM}, address = {New York, NY, USA}, keywords = {modular code generation, sparse tensor algebra compilation, tensor formats}, }

library_booksTensor Algebra Compilation with Workspaces
Abstract
This paper shows how to extend sparse tensor algebra compilers to introduce temporary tensors called workspaces to avoid inefficient sparse data structures accesses. We develop an intermediate representation (IR) for tensor operations called concrete index notation that specifies when subcomputations occur and where they are stored. We then describe the workspace transformation in this IR, how to programmatically invoke it, and how the IR is compiled to sparse code. Finally, we show how the transformation can be used to optimize sparse tensor kernels, including sparse matrix multiplication, sparse tensor addition, and the matricized tensor times KhatriRao product (MTTKRP).
Our results show that the workspace transformation brings the performance of these kernels on par with handoptimized implementations. For example, we improve the performance of MTTKRP with dense output by up to 35\%, and enable generating sparse matrix multiplication and MTTKRP with sparse output, neither of which were supported by prior tensor algebra compilers. This paper shows how to optimize sparse tensor algebraic expressions by introducing temporary tensors, called workspaces, into the resulting loop nests. We develop a new intermediate language for tensor operations called concrete index notation that extends tensor index notation. Concrete index notation expresses when and where subcomputations occur and what tensor they are stored into. We then describe the workspace optimization in this language, and how to compile it to sparse code by building on prior work in the literature.BibTex
@article{kjolstad:2018:workspaces, author = {Kjolstad, Fredrik and Ahrens, Peter and Kamil, Shoaib and Amarasinghe, Saman}, title = {Tensor Algebra Compilation with Workspaces}, booktitle = {Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization}, series = {CGO 2019}, year = {2019}, isbn = {9781728114361}, location = {Washington, DC, USA}, pages = {180192}, numpages = {13}, url = {http://dl.acm.org/citation.cfm?id=3314872.3314894}, acmid = {3314894}, publisher = {IEEE Press}, address = {Piscataway, NJ, USA}, keywords = {code optimization, concrete index notation, sparse tensor algebra, temporaries, workspaces}, }