Publications

2020

  • library_booksA Sparse Iteration Space Transformation Framework for Sparse Tensor Algebra  Ryan Senanayake, Changwan Hong, Ziheng Wang, Amalee Wilson, Stephen Chou, Shoaib Kamil, Saman Amarasinghe, and Fredrik Kjolstad
    Abstract
    We address the problem of optimizing sparse tensor algebra in a compiler and show how to define standard loop transformations---split, collapse, and reorder---on sparse iteration spaces. The key idea is to track the transformation functions that map the original iteration space to derived iteration spaces. These functions are needed by the code generator to emit code that maps coordinates between iteration spaces at runtime, since the coordinates in the sparse data structures remain in the original iteration space. We further demonstrate that derived iteration spaces can tile both the universe of coordinates and the subset of nonzero coordinates: the former is analogous to tiling dense iteration spaces, while the latter tiles sparse iteration spaces into statically load-balanced blocks of nonzeros. Tiling the space of nonzeros lets the generated code efficiently exploit heterogeneous compute resources such as threads, vector units, and GPUs.

    We implement these concepts by extending the sparse iteration theory implementation in the TACO system. The associated scheduling API can be used by performance engineers or it can be the target of an automatic scheduling system. We outline one heuristic autoscheduling system, but other systems are possible. Using the scheduling API, we show how to optimize mixed sparse-dense tensor algebra expressions on CPUs and GPUs. Our results show that the sparse transformations are sufficient to generate code with competitive performance to hand-optimized implementations from the literature, while generalizing to all of the tensor algebra.
    BibTex
    @article{senanayake:2020:scheduling,
      author = {Senanayake, Ryan and Hong, Changwan and Wang, Ziheng and Wilson, Amalee and Chou, Stephen and Kamil, Shoaib and Amarasinghe, Saman and Kjolstad, Fredrik},
      title = {A Sparse Iteration Space Transformation Framework for Sparse Tensor Algebra},
      year = {2020},
      issue_date = {November 2020},
      publisher = {Association for Computing Machinery},
      address = {New York, NY, USA},
      volume = {4},
      number = {OOPSLA},
      url = {https://doi.org/10.1145/3428226},
      doi = {10.1145/3428226},
      journal = {Proc. ACM Program. Lang.},
      month = nov,
      articleno = {158},
      numpages = {30},
      keywords = {Sparse Tensor Algebra, Sparse Iteration Spaces, Optimizing Transformations}
    }
  • library_booksAutomatic Generation of Efficient Sparse Tensor Format Conversion Routines  Stephen Chou, Fredrik Kjolstad, and Saman Amarasinghe
    Abstract
    This paper shows how to generate code that efficiently converts sparse tensors between disparate storage formats (data layouts) such as CSR, DIA, ELL, and many others. We decompose sparse tensor conversion into three logical phases: coordinate remapping, analysis, and assembly. We then develop a language that precisely describes how different formats group together and order a tensor’s nonzeros in memory. This lets a compiler emit code that performs complex remappings of nonzeros when converting between formats. We also develop a query language that can extract statistics about sparse tensors, and we show how to emit efficient analysis code that computes such queries. Finally, we define an abstract interface that captures how data structures for storing a tensor can be efficiently assembled given specific statistics about the tensor. Disparate formats can implement this common interface, thus letting a compiler emit optimized sparse tensor conversion code for arbitrary combinations of many formats without hard-coding for any specific combination.

    Our evaluation shows that the technique generates sparse tensor conversion routines with performance between 1.00 and 2.01× that of hand-optimized versions in SPARSKIT and Intel MKL, two popular sparse linear algebra libraries. And by emitting code that avoids materializing temporaries, which both libraries need for many combinations of source and target formats, our technique outperforms those libraries by 1.78 to 4.01× for CSC/COO to DIA/ELL conversion.
    BibTex
    @inproceedings{chou:2020:conversion,
      author = {Chou, Stephen and Kjolstad, Fredrik and Amarasinghe, Saman},
      title = {Automatic Generation of Efficient Sparse Tensor Format Conversion Routines},
      year = {2020},
      isbn = {9781450376136},
      publisher = {Association for Computing Machinery},
      address = {New York, NY, USA},
      url = {https://doi.org/10.1145/3385412.3385963},
      doi = {10.1145/3385412.3385963},
      booktitle = {Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation},
      pages = {823–838},
      numpages = {16},
      keywords = {coordinate remapping notation, attribute query language, sparse tensor formats, sparse tensor conversion, sparse tensor assembly, sparse tensor algebra},
      location = {London, UK},
      series = {PLDI 2020}
    }
  • library_booksSparse Tensor Transpositions  Suzanne Mueller, Peter Ahrens, Stephen Chou, Fredrik Kjolstad, and Saman Amarasinghe
    Abstract
    We present a new algorithm for transposing sparse tensors called Quesadilla. The algorithm converts the sparse tensor data structure to a list of coordinates and sorts it with a fast multi-pass radix algorithm that exploits knowledge of the requested transposition and the tensors input partial coordinate ordering to provably minimize the number of parallel partial sorting passes. We evaluate both a serial and a parallel implementation of Quesadilla on a set of 19 tensors from the FROSTT collection, a set of tensors taken from scientific and data analytic applications. We compare Quesadilla and a generalization, Top-2-sadilla to several state of the art approaches, including the tensor transposition routine used in the SPLATT tensor factorization library. In serial tests, Quesadilla was the best strategy for 60% of all tensor and transposition combinations and improved over SPLATT by at least 19% in half of the combinations. In parallel tests, at least one of Quesadilla or Top-2-sadilla was the best strategy for 52% of all tensor and transposition combinations.
    BibTex
    @inproceedings{10.1145/3350755.3400245,
      author = {Mueller, Suzanne and Ahrens, Peter and Chou, Stephen and Kjolstad, Fredrik and Amarasinghe, Saman},
      title = {Sparse Tensor Transpositions},
      year = {2020},
      isbn = {9781450369350},
      publisher = {Association for Computing Machinery},
      address = {New York, NY, USA},
      url = {https://doi.org/10.1145/3350755.3400245},
      doi = {10.1145/3350755.3400245},
      booktitle = {Proceedings of the 32nd ACM Symposium on Parallelism in Algorithms and Architectures},
      pages = {559–561},
      numpages = {3},
      keywords = {COO, sorting, radix sort, transposition, sparse tensors},
      location = {Virtual Event, USA},
      series = {SPAA '20}
    }
  • library_booksA Framework for Computing on Sparse Tensors based on Operator Properties  Rawn Henry
    Abstract
    Tensor operations have been traditionally limited to addition and multiplication operations. For operations of sparse tensors, these semantics were extended to account for the fact that tensors usually omit zero values. However, there are many operators with a rich semantics of operator properties that can be used in dense and sparse tensor computations.

    This work addresses the problem of generating code for computing on a mix of sparseand dense tensors based on the properties of the operators on those tensors. I introduce the concept of a fill value to each tensor so that the data can be sparse on non-zeros. I show how to reason about the operator properties, along with the fillvalues of the input tensors in order to construct an IR describing how to iterate overthese tensors. I show how we can take advantage of the operator properties to perform useful optimizations for both iterating over tensors and performing reductions. Lastly, I show how a user can leverage set notation to directly describe to a compiler how it should iterate over sparse tensors.

    The ideas discussed in this work have been prototyped in the open-source TACOsystem. The API used makes operator properties and tensor fill values have to beexplicitly provided by the user. However, it makes the TACO system much more flexible. I show how the primitives exposed in this work allows one to efficiently performseveral graph algorithms by drawing on the literature about GraphBLAS. In the evaluation section, we benchmark this system against the SuiteSparse implementation ofGraphBLAS on a variety of graph algorithms to demonstrate its performance.
    BibTex
    @MastersThesis{henry:2020:thesis,
      title = "A Framework for Computing on Sparse Tensors based on Operator Properties",
      author = "Rawn Henry",
      month = "May",
      year = "2020",
      url = "http://groups.csail.mit.edu/commit/papers/2020/rawn_2020.pdf",
      type = "M.Eng. Thesis",
      address = "Cambridge, MA",
      school = "Massachusetts Institute of Technology",
    }
  • library_booksAutomatic Optimization of Sparse Tensor Algebra Programs  Ziheng Wang
    Abstract
    In this thesis, I attempt to give some guidance on how to automatically optimize programs using a domain-specific-language (DSLs) compiler that exposes a set of scheduling commands. These DSLs have proliferated as of late, including Halide, TACO, Tiramisu and TVM, to name a few. The scheduling commands allow succinct expression of the programmer’s desire to perform certain loop transformations,such as strip-mining, tiling, collapsing and parallelization, which the compiler proceeds to carry out. I explore if we can automatically generate schedules with good performance.
    The main difficulty in searching for good schedules is the astronomical number of valid schedules for a particular program. I describe a system which generates a list of candidate schedules through a set of modular stages. Different optimization decisions are made at each stage, to trim down the number of schedules considered. I argue that certain sequences of scheduling commands are equivalent in their effect in partitioning the iteration space, and introduce heuristics that limit the number of permutations of variables. I implement these ideas for the open-source TACO system. I demonstrate several orders of magnitude reduction in the effective schedule search space. I also show that for most of the problems considered, we can generate schedules better than or equal to hand-tuned schedules in terms of performance.
    BibTex
    @MastersThesis{wang:2020:thesis,
      title = "Automatic Optimization of Sparse Tensor Algebra Programs",
      author = "Ziheng Wang",
      month = "May",
      year = "2020",
      url = "http://groups.csail.mit.edu/commit/papers/2020/tony_2020.pdf",
      type = "M.Eng. Thesis",
      address = "Cambridge, MA",
      school = "Massachusetts Institute of Technology",
    }
  • library_booksA Unified Iteration Space Transformation Framework for Sparse and Dense Tensor Algebra  Ryan Senanayake
    Abstract
    This work addresses the problem of optimizing mixed sparse and dense tensor algebra in a compiler. I show that standard loop transformations, such as strip-mining, tiling, collapsing, parallelization and vectorization, can be applied to irregular loops over sparse iteration spaces. I also show how these transformations can be applied to the contiguous value arrays of sparse tensor data structures, which I call their position spaces, to unlock load-balanced tiling and parallelism.
    These concepts have been prototyped in the open-source TACO system, where they are exposed as a scheduling API similar to the Halide domain-specific language for dense computations. Using this scheduling API, I show how to optimize mixed sparse/dense tensor algebra expressions, how to generate load-balanced code by scheduling sparse tensor algebra in position space, and how to generate sparse tensor algebra GPU code. As shown in the evaluation, these transformations allow us to generate code that is competitive with many hand-optimized implementations from the literature.
    BibTex
    @MastersThesis{senanayake:2020:thesis,
      title = "A Unified Iteration Space Transformation Framework for Sparse and Dense Tensor Algebra",
      author = "Ryan Senanayake",
      month = "Feb",
      year = "2020",
      url = "http://groups.csail.mit.edu/commit/papers/2020/ryan_2020.pdf",
      type = "M.Eng. Thesis",
      address = "Cambridge, MA",
      school = "Massachusetts Institute of Technology",
    }
  • library_booksSparse Tensor Transpositions in the Tensor Algebra Compiler  Suzanne Mueller
    Abstract
    The current state of the art for transposing sparse tensors involves converting the sparse tensor into a list of coordinates, sorting the list of coordinates and finally packing the list of coordinates into the desired sparse tensor format. This thesis explores the potential for faster methodologies. Its main contributions are an algorithm that exploits partial sortedness to minimize sorting passes and an implementation that demonstrates that this transposition algorithm is competitive with state of the art. In particular the algorithm takes advantage of the ordering that already exists to apply selective sorting passes and thereby reduce the amount of work that needsto be done to reorder the tensor.
    BibTex
    @MastersThesis{mueller:2020:thesis,
      title = "Sparse Tensor Transpositions in the Tensor Algebra Compiler",
      author = "Suzanne Mueller",
      month = "Feb",
      year = "2020",
      url = "http://groups.csail.mit.edu/commit/papers/2020/suzanne_2020.pdf",
      type = "M.Eng. Thesis",
      address = "Cambridge, MA",
      school = "Massachusetts Institute of Technology",
    }
  • library_booksSparse Tensor Algebra Compilation  Fredrik Kjolstad
    Abstract
    This dissertation shows how to compile any sparse tensor algebra expression to CPU and GPU code that matches the performance of hand-optimized implementations. A tensor algebra expression is sparse if at least one of its tensor operands is sparse, and a tensor is sparse if most of its values are zero. If a tensor is sparse, then we can store its nonzero values in a compressed data structure, and omit the zeros. Indeed, as the matrices and tensors in many important applications are extremely sparse, compressed data structures provide the only practical means to store them. A sparse tensor algebra expression may contain any number of operations, which must be compiled to fused sparse loops that compute the entire expression simultaneously. It is not viable to support only binary expressions, because their composition may result in worse asymptotic complexity than the fused implementation. I present compiler techniques to generate fused loops that coiterate over any number of tensors stored in different types of data structures. By design, these loops avoid computing values known to be zero due to the algebraic properties of their operators.
    Sparse tensor algebra compilation is made possible by a sparse iteration theory that formulates sparse iteration spaces as set expressions of the coordinates of nonzero values. By ordering iteration space dimensions hierarchically, the compiler recursively generates loops that coiterate over tensor data structures one dimension at a time. By organizing per-dimension coiteration into regions based on algebraic operator properties, it removes operations that will result in zero. And by transforming the sparse iteration spaces, it optimizes the generated code. The result is the first sparse iteration compiler, called the Tensor Algebra Compiler (taco). Taco can compile any tensor algebra expressions, with tensors stored in different types of sparse and dense data structures, to code that matches the performance of hand-optimized implementations on CPUs and GPUs.
    BibTex
    @PhDThesis{kjolstad:2020:thesis,
      title = "Sparse Tensor Algebra Compilation",
      author = "Fredrik Kjolstad",
      month = "Feb",
      year = "2020",
      url = "http://groups.csail.mit.edu/commit/papers/2020/kjolstad-thesis.pdf",
      type = "Ph.D. Thesis",
      address = "Cambridge, MA",
      school = "Massachusetts Institute of Technology",
    }

2019

  • library_booksA Tensor Algebra Compiler Library Interface and Runtime  Patricio Noyola
    Abstract
    Tensor algebra is a powerful tool for computing on multidimensional data and has applications in many fields. The number of possible tensor operations is infinite, so it is impossible to manually implement all of them to operate on different tensor dimensions. The Tensor Algebra Compiler (taco) introduced a compiler approach to automatically generate kernels for any compound tensor algebra operation on any input tensor formats.

    In this thesis, we present a new API for the taco library. The API removes the need to call compiler methods with the introduction of a delayed execution framework. Additionally, the API introduces multiple important tensor algebra features previously unavailable in taco. Finally, we propose extensions to taco’s code generation algorithm to automatically generate tensor API methods for any tensor format.

    The proposed API makes taco code cleaner, shorter and more elegant. Furthermore, the extensions to its code generation algorithm make the API scalable to new formats and operations.
    BibTex
    @MastersThesis{noyola:2019:thesis,
      title = "A Tensor Algebra Compiler Library Interface and Runtime",
      author = "Patricio Noyola",
      month = "May",
      year = "2019",
      url = "http://groups.csail.mit.edu/commit/papers/2019/Patricio_thesis.pdf",
      type = "M.Eng. Thesis",
      address = "Cambridge, MA",
      school = "Massachusetts Institute of Technology",
    }
  • library_booksSuperTaco: Taco Tensor Algebra Kernels on Distributed Systems Using Legion  Sachin Dilip Shinde
    Abstract
    Tensor algebra is a powerful language for expressing computation on multidimensional data. While many tensor datasets are sparse, most tensor algebra libraries have limited support for handling sparsity. The Tensor Algebra Compiler (Taco) has introduced a taxonomy for sparse tensor formats that has allowed them to compile sparse tensor algebra expressions to performant C code, but they have not taken advantage of distributed systems.

    This work provides a code generation technique for creating Legion programs that distribute the computation of Taco tensor algebra kernels across distributed systems, and a scheduling language for controlling how this distributed computation is structured. This technique is implemented in the form of a command-line tool called SuperTaco. We perform a strong scaling analysis for the SpMV and TTM kernels under a row blocking distribution schedule, and find speedups of 9-10x when using 20 cores on a single node. For multi-node systems using 20 cores per node, SpMV achieves a 33.3x speedup at 160 cores and TTM achieves a 42.0x speedup at 140 cores.
    BibTex
    @MastersThesis{shinde:2019:thesis,
      title = "SuperTaco: Taco Tensor Algebra Kernels on Distributed Systems Using Legion",
      author = "Sachin Dilip Shinde",
      month = "Feb",
      year = "2019",
      url = "http://groups.csail.mit.edu/commit/papers/2019/thesis-shinde.pdf",
      type = "M.Eng. Thesis",
      address = "Cambridge, MA",
      school = "Massachusetts Institute of Technology",
    }
  • library_booksTensor Algebra Compilation with Workspaces  Fredrik Kjolstad, Peter Ahrens, Shoaib Kamil, and Saman Amarasinghe
    Abstract
    This paper shows how to extend sparse tensor algebra compilers to introduce temporary tensors called workspaces to avoid inefficient sparse data structures accesses. We develop an intermediate representation (IR) for tensor operations called concrete index notation that specifies when sub-computations occur and where they are stored. We then describe the workspace transformation in this IR, how to programmatically invoke it, and how the IR is compiled to sparse code. Finally, we show how the transformation can be used to optimize sparse tensor kernels, including sparse matrix multiplication, sparse tensor addition, and the matricized tensor times Khatri-Rao product (MTTKRP).

    Our results show that the workspace transformation brings the performance of these kernels on par with hand-optimized implementations. For example, we improve the performance of MTTKRP with dense output by up to 35\%, and enable generating sparse matrix multiplication and MTTKRP with sparse output, neither of which were supported by prior tensor algebra compilers. This paper shows how to optimize sparse tensor algebraic expressions by introducing temporary tensors, called workspaces, into the resulting loop nests. We develop a new intermediate language for tensor operations called concrete index notation that extends tensor index notation. Concrete index notation expresses when and where sub-computations occur and what tensor they are stored into. We then describe the workspace optimization in this language, and how to compile it to sparse code by building on prior work in the literature.
    BibTex
    @article{kjolstad:2018:workspaces,
     author = {Kjolstad, Fredrik and Ahrens, Peter and Kamil, Shoaib and Amarasinghe, Saman},
     title = {Tensor Algebra Compilation with Workspaces},
     booktitle = {Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization},
     series = {CGO 2019},
     year = {2019},
     isbn = {978-1-7281-1436-1},
     location = {Washington, DC, USA},
     pages = {180--192},
     numpages = {13},
     url = {http://dl.acm.org/citation.cfm?id=3314872.3314894},
     acmid = {3314894},
     publisher = {IEEE Press},
     address = {Piscataway, NJ, USA},
     keywords = {code optimization, concrete index notation, sparse tensor algebra, temporaries, workspaces},
    }

2018

  • library_booksFormat Abstraction for Sparse Tensor Algebra Compilers  Stephen Chou, Fredrik Kjolstad, and Saman Amarasinghe
    Abstract
    This paper shows how to build a sparse tensor algebra compiler that is agnostic to tensor formats (data layouts). We develop an interface that describes formats in terms of their capabilities and properties, and show how to build a modular code generator where new formats can be added as plugins. We then describe six implementations of the interface that compose to form the dense, CSR/CSF, COO, DIA, ELL, and HASH tensor formats and countless variants thereof. With these implementations at hand, our code generator can generate code to compute any tensor algebra expression on any combination of the aforementioned formats.

    To demonstrate our technique, we have implemented it in the taco tensor algebra compiler. Our modular code generator design makes it simple to add support for new tensor formats, and the performance of the generated code is competitive with hand-optimized implementations. Furthermore, by extending taco to support a wider range of formats specialized for different application and data characteristics, we can improve end-user application performance. For example, if input data is provided in the COO format, our technique allows computing a single matrix-vector multiplication directly with the data in COO, which is up to 3.6× faster than by first converting the data to CSR.
    BibTex
    @article{chou:2018:formats,
     author = {Chou, Stephen and Kjolstad, Fredrik and Amarasinghe, Saman},
     title = {Format Abstraction for Sparse Tensor Algebra Compilers},
     journal = {Proc. ACM Program. Lang.},
     issue_date = {November 2018},
     volume = {2},
     number = {OOPSLA},
     month = oct,
     year = {2018},
     issn = {2475-1421},
     pages = {123:1--123:30},
     articleno = {123},
     numpages = {30},
     url = {http://doi.acm.org/10.1145/3276493},
     doi = {10.1145/3276493},
     acmid = {3276493},
     publisher = {ACM},
     address = {New York, NY, USA},
     keywords = {modular code generation, sparse tensor algebra compilation, tensor formats},
    }
  • library_booksUnified Sparse Formats for Tensor Algebra Compilers  Stephen Chou
    Abstract
    Tensor algebra is a powerful tool for computing on multidimensional data and has appli-cations in many fields. Practical applications often deal with tensors that are sparse, and there exists a wide variety of formats for storing such tensors, each suited to specific typesof applications and data. Examples of sparse tensor storage formats include COO, CSR, CSC, DCSR, BCSR, CSF, CSB, ELL, DIA, and hash maps.

    In this thesis, we propose a levelized hierarchical abstraction that represents these seemingly disparate formats and countless others, and that hides the details of each format behind a common interface. We show that this tensor representation facilitates automatic generation of efficient compute kernels for tensor algebra expressions with any combination of formats. This is accomplished with a code generation algorithm that generates code level by level, guided by the capabilities and properties of the levels.

    The performance of tensor algebra kernels generated using our technique is competitive with that of equivalent hand-implemented kernels in existing sparse linear and tensor algebra libraries. Furthermore, our technique can generate many more kernels for many more formats than exist in libraries or are supported by existing compiler techniques.
    BibTex
    @ScienceThesis{chou:sm-thesis:2018,
      title = "Unified Sparse Formats for Tensor Algebra Compilers",
      author = "Stephen Chou",
      month = "Feb",
      year = "2018",
      url = "http://groups.csail.mit.edu/commit/papers/2018/chou-18-sm-thesis.pdf",
      type = "S.M. Thesis",
      address = "Cambridge, MA",
      school = "Massachusetts Institute of Technology",
    }

2017

  • library_booksThe Tensor Algebra Compiler  Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe
    Abstract
    Tensor algebra is a powerful tool with applications in machine learning, data analytics, engineering and the physical sciences. Tensors are often sparse and compound operations must frequently be computed in a single kernel for performance and to save memory. Programmers are left to write kernels for every operation of interest, with different mixes of dense and sparse tensors in different formats. The combinations are infinite, which makes it impossible to manually implement and optimize them all. This paper introduces the first compiler technique to automatically generate kernels for any compound tensor algebra operation on dense and sparse tensors. The technique is implemented in a C++ library called taco. Its performance is competitive with best-in-class hand-optimized kernels in popular libraries, while supporting far more tensor operations.
    BibTex
    @article{kjolstad:2017:taco,
     author = {Kjolstad, Fredrik and Kamil, Shoaib and Chou, Stephen and Lugato, David and Amarasinghe, Saman},
     title = {The Tensor Algebra Compiler},
     journal = {Proc. ACM Program. Lang.},
     issue_date = {October 2017},
     volume = {1},
     number = {OOPSLA},
     month = oct,
     year = {2017},
     issn = {2475-1421},
     pages = {77:1--77:29},
     articleno = {77},
     numpages = {29},
     url = {http://doi.acm.org/10.1145/3133901},
     doi = {10.1145/3133901},
     acmid = {3133901},
     publisher = {ACM},
     address = {New York, NY, USA},
     keywords = {code generation, iteration graphs, linear algebra, merge lattices, parallelism, performance, sparse data structures, tensor algebra, tensors}
    }
  • library_bookstaco: A Tool to Generate Tensor Algebra Kernels  Fredrik Kjolstad, Stephen Chou, David Lugato, Shoaib Kamil, and Saman Amarasinghe
    Abstract
    Tensor algebra is an important computational abstraction that is increasingly used in data analytics, machine learning, engineering, and the physical sciences. However, the number of tensor expressions is unbounded, which makes it hard to develop and optimize libraries. Furthermore, the tensors are often sparse (most components are zero), which means the code has to traverse compressed formats. To support programmers we have developed taco, a code generation tool that generates dense, sparse, and mixed kernels from tensor algebra expressions. This paper describes the taco web and command-line tools and discusses the benefits of a code generator over a traditional library. See also the demo video at tensor-compiler.org/ase2017.
    BibTex
    @inproceedings{kjolstad:2017:tacotool, 
      author={Kjolstad, Fredrik and Chou, Stephen and Lugato, David and Kamil, Shoaib and Amarasinghe, Saman}, 
      booktitle={2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)}, 
      title={taco: A Tool to Generate Tensor Algebra Kernels}, 
      year={2017}, 
      pages={943-948}, 
      keywords={data analysis;learning (artificial intelligence);mathematics computing;program compilers;software libraries;tensors;code generation tool;code generator;command-line tools;compressed formats;computational abstraction;data analytics;dense kernels;machine learning;mixed kernels;physical sciences;sparse kernels;taco web;tensor algebra expressions;tensor algebra kernels;tensor expressions;Indexes;Kernel;Libraries;Linear algebra;Tensile stress;Tools;Tensor algebra;compiler;linear algebra;sparse}, 
      doi={10.1109/ASE.2017.8115709}, 
      month={Oct}
    }

2016

  • library_booksAn Investigation of Sparse Tensor Formats for Tensor Libraries  Parker Allen Tew
    Abstract
    Tensors provide a generalized structure to store arbitrary indexable data, which is applicable in fields such as chemometrics, physics simulations, signal processing and lies at the heart of machine learning. Many naturally occurring tensors are considered sparse as they contain mostly zero values. As with sparse matrices, various techniques can be employed to more efficiently store and compute on these sparse tensors.

    This work explores several sparse tensor formats while ultimately evaluating two implementations; one based on explicitly storing coordinates and one that compresses these coordinates. The two formats, Coordinate and CSF2, were evaluated by comparing their execution time of tensor-matrix products and the MTTKRP operation on several datasets. We find that the Coordinate format is superior for uniformly distributed sparse tensors or when used in computation that emits a sparse tensor via a mode dependent operation. In all other considered cases for large sparse tensors, the storage savings of the compressed format provide the best results.
    BibTex
    @MastersThesis{parker:meng-thesis:2016,
      title = "An Investigation of Sparse Tensor Formats for Tensor Libraries",
      author = "Parker Allen Tew",
      month = "Jun",
      year = "2016",
      url = "http://groups.csail.mit.edu/commit/papers/2016/parker-thesis.pdf",
      type = "M.Eng. Thesis",
      address = "Cambridge, MA",
      school = "Massachusetts Institute of Technology",
    }