Specifying Tensor Algebra Computations

Tensor algebra computations can be expressed in TACO using tensor index notation, which at a high level describes how each element in the result tensor can be computed from elements in the operand tensors. As an example, matrix addition can be expressed in index notation as

where , , and denote two-dimensional tensors (i.e., matrices) while and are index variables that represent abstract indices into the corresponding dimensions of the tensors. In plain English, the example above essentially states that, for every and , the element in the -th row and -th column of should be assigned the sum of the corresponding elements in and . Similarly, element-wise multiplication of three tensors can be expressed in index notation as

To define the same computation using the TACO Python library, we can write very similar code, with the main difference being that we also have to explicitly declare the index variables beforehand:

i, j, k = pytaco.index_var(), pytaco.index_var(), pytaco.index_var()
A[i,j,k] = B[i,j,k] * C[i,j,k] * D[i,j,k]

This can also be rewritten more compactly as

i, j, k = pytaco.get_index_vars(3)
A[i,j,k] = B[i,j,k] * C[i,j,k] * D[i,j,k]

Note

Accesses to scalars also require the square brackets notation. Since scalars are equivalent to tensors with zero dimension, None must be explicitly specified as indices to indicate that no index variable is needed to access a scalar. As an example, the following expresses the addition of two scalars beta and gamma:

alpha[None] = beta[None] + gamma[None]

Warning

TACO currently does not support computations that have a tensor as both an operand and the result, such as the following:

a[i] = a[i] * b[i]

Such computations can be rewritten using explicit temporaries as follows:

t[i] = a[i] * b[i]
a[i] = t[i]

Warning

TACO currently does not support using the same index variable to index into multiple dimensions of the same tensor operand (e.g., A[i,i]).

Expressing Reductions

In all of the previous examples, all the index variables are used to index into both the result and the operands of a computation. It is also possible for an index variable to be used to index into the operands only, in which case the dimension indexed by that index variable is reduced (summed) over. For instance, the computation

can be rewritten with the summation more explicit as

and demonstrates how matrix-vector multiplication can be expressed in index notation. Both forms are supported by TACO:

i, j = pytaco.get_index_vars(2)

y[i] = A[i,j] * x[j]
y[i] = pytaco.sum(j, A[i,j] * x[j])

Reductions that are not explicitly expressed are assumed to be over the smallest subexpression that captures all uses of the corresponding reduction variable. For instance, the computation

is equivalent to

whereas the computation

is equivalent to

Expressing Broadcasts

TACO supports computations that broadcasts tensors along any number of dimensions. The following example, for instance, broadcasts the vector c along the row dimension of matrix B, adding c to each row of B:

A[i, j] =  B[i, j] + c[j]

However, TACO does not support NumPy-style broadcasting of dimensions that have a size of one. For example, the following is not allowed:

A = pt.tensor([3,3])
B = pt.tensor([3,3])
C = pt.tensor([3,1])
i, j = pt.get_index_vars(2)

A[i, j] =  B[i, j] + C[i, j]  # ERROR!!

Expressing Transposes

Computations that transpose tensors can be expressed by rearranging the order in which index variables are used to access tensor operands. The following example, for instance, adds matrix B to the transpose of matrix C and stores the result in matrix A:

A = pt.tensor([3,3], pt.format([dense, dense]))
B = pt.tensor([3,3], pt.format([dense, dense]))
C = pt.tensor([3,3], pt.format([dense, dense]))
i, j = pt.get_index_vars(2)

A[i,j] = B[i,j] + C[j,i]

Note, however, that sparse dimensions of tensor operands impose dependencies on the order in which they can be accessed, based on the order in which they are stored in the operand formats. This means, for instance, that if B is a CSR matrix, then B[i,j] requires that the dimension indexed by i be accessed before the dimension indexed by j. TACO does not support any computation where these constraints form a cyclic dependency. So the same computation from before is not supported for CSR matrices, since the access of B requires that i be accessed before j but the access of C requires that j be accessed before i:

A = pt.tensor([3,3], pt.format([dense, compressed]))
B = pt.tensor([3,3], pt.format([dense, compressed]))
C = pt.tensor([3,3], pt.format([dense, compressed]))
i, j = pt.get_index_vars(2)

A[i,j] = B[i,j] + C[j,i]  # ERROR!!

As an alternative, you can first explicitly transpose C by invoking its transpose method, storing the result in a temporary, and then perform the addition with the already-transposed temporary:

A = pt.tensor([3,3], pt.format([dense, compressed]))
B = pt.tensor([3,3], pt.format([dense, compressed]))
C = pt.tensor([3,3], pt.format([dense, compressed]))
i, j = pt.get_index_vars(2)

Ct = C.transpose([1, 0])  # Ct is also stored in the CSR format
A[i,j] = B[i,j] + Ct[i,j]

Similarly, the following computation is not supported for the same reason that the access of B, which is stored in row-major order, requires i to be accessed before j but the access of C, which is stored in column-major order, requires j to be accessed before i:

A = pt.tensor([3,3], pt.format([dense, compressed]))
B = pt.tensor([3,3], pt.format([dense, compressed]))
C = pt.tensor([3,3], pt.format([dense, compressed], [1, 0]))
i, j = pt.get_index_vars(2)

A[i,j] = B[i,j] + C[i,j]  # ERROR!!

We can again perform the same computation by invoking transpose, this time to repack C into the same CSR format as A and B before computing the addition:

A = pt.tensor([3,3], pt.format([dense, compressed]))
B = pt.tensor([3,3], pt.format([dense, compressed]))
C = pt.tensor([3,3], pt.format([dense, compressed], [1, 0]))
i, j = pt.get_index_vars(2)

Cp = C.transpose([0, 1], pt.format([dense, compressed]))  # Store a copy of C in the CSR format
A[i,j] = B[i,j] + Cp[i,j]

Performing the Computation

Once a tensor algebra computation has been defined, you can simply invoke the result tensor's evaluate method to perform the actual computation:

A.evaluate()

Under the hood, TACO will first invoke the result tensor's compile method to generate code that performs the computation. TACO will then perform the actual computation by first invoking assemble to compute the sparsity structure of the result and subsequently invoking compute to compute the values of the result's nonzero elements. Of course, you can also manually invoke these methods in order to more precisely control when each step happens:

A.compile()
A.assemble()
A.compute()

If you define a computation and then access the result without first manually invoking evaluate or compile/assemble/compute, TACO will automatically invoke the computation immediately before the result is accessed. In the following example, for instance, TACO will automatically generate code to compute the vector addition and then also actually perform the computation right before a[0] is printed:

a[i] = b[i] + c[i]
print(a[0])