This article is related to my project MATLAB Tensors and explains how one of the key functions there  tensorHadamard
 works, as well as use cases where it can help.
The Problem / Motivation
I was working on an optimization problem recently and creating a numerical solution in MATLAB. During this work, I've encountered a similar problem of calculating one tensor (multidimensional array) based on several input vectors/matrices/tensors. Here's an example of one such encounter (but just for illustration, don't worry!):
\[\frac{{\partial }^2v_G}{{\left(\partial m\right)}^2}\ [n*n*n]\to \frac{{\partial }^2v_{Gi}}{\partial m_j\partial m_k}=\frac{{\overline{r}}_i}{{\mu }^2_{Fi}}\frac{\partial {\mu }_{Fi}}{\partial m_j}\frac{\partial {\mu }_{Fi}}{\partial m_k}\left(\frac{v_j}{{\overline{r}}_j}+\frac{2}{R^2_G}\frac{x_i}{{\mu }_{Fi}}\frac{v_k}{{\overline{r}}_k}\right)\]
It looks ugly and scary, I know, but it mostly comes from some scary notations used (like partial derivatives, Greek letters, etc.). In reality, this problem is quite easy: on the right side, we have one matrix, four vectors, and one scalar, and on the left side (the result) we have some tensor (threedimensional tensor in this example). In fact, the essence of this problem is quite simple and can be expressed like:
\[\Large T_{[m*n*p]} \to t_{i,j,k}=a_i*b_j*c_k\]
Looks much better, right? Here, on the right side we have only three vectors: \(a_{[n]}\), with n
elements, \(b_{[m]}\), with m
elements, and \(c_{[p]}\), with p
elements. But what's the problem then? This equation is so simple that many developers would be able to implement it in minutes  just use loops! Here's one such implementation (GitHub):
function T = example1Loops(a, b, c)
%EXAMPLE1LOOPS Calculates the tensor by using loops.
% Input arguments a, b and c are column vectors.
% Get dimensions:
n = length(a);
m = length(b);
p = length(c);
% Initialize the result:
T = zeros(n, m, p);
% Calculate:
for i = 1:n
for j = 1:m
for k = 1:p
T(i, j, k) = a(i) * b(j) * c(k);
end
end
end
end
Although being accurate, this implementation suffers one problem: performance. We have a loop within a loop, within a loop. If vectors a
, b
and c
are big, we would have too many loop passes. For example, if a
, b
and c
have 100 elements each, it would cause one million loop passes. If they are 1000 elements each, it will cause one billion passes, and it'll take some time. It's a wellknown fact that loops should be avoided in MATLAB. In fact, it's true for most (if not all) programming languages.
Can we implement the same calculation without using loops but using some matrix manipulations/arithmetics instead? In fact, we can. As we all know, MATLAB is quite good with matrices. At the end of the day, it's named based on that (its name comes from MATrix LABoratory). It's fair to say that some other languages are also good at that, like Python and its NumPy library.
Note: Hardcore developers and engineers among you have probably noticed that what I've just said isn't completely true. Problems of this kind definitely require some repetition, so even if we don't write loops explicitly, some looping will have to happen in the background (in MATLAB engine). But the difference in performances between loops in compiled C/C++ code and loops in interpreted MATLAB code is huge. This is especially true because modern C/C++ compilers will use some CPU features, like Vectorization. To conclude: we should simply avoid loops in MATLAB and don't care about what happens down the stream.
Loopless Implementation
Implementation without loops will be based on the fact that in MATLAB we can perform elementwise multiplication (a.k.a Hadamard product) of two multidimensional arrays without using loops, simply by writing something like R = A .* B
. Here A
and B
must have the same size and the result R
will also have the same size. But in our original problem, the inputs are vectors (\(a_{[n]}\), \(b_{[m]}\), and \(c_{[p]}\)), while the output is a tensor (\(T_{[n,m,p]}\)), so we cannot use this feature directly. We should try to find a way to convert the input vectors to tensors with the required size \([n,m,p]\), and then get the same correct solution by simply multiplying these tensors elementwise. Let's write what I've just said in mathematical language. We need to find the following transformations:
\[\Large {a_{[n]}}\xrightarrow{{{f_a}}}{A_{[n,m,p]}};\space {b_{[m]}}\xrightarrow{{{f_b}}}{B_{[n,m,p]}};\space {c_{[p]}}\xrightarrow{{{f_c}}}{C_{[n,m,p]}}\]
so that the following operation produces the same correct result as the original solution based on loops:
\[\Large {T_{[n,m,p]}} = {A_{[n,m,p]}} \odot {B_{[n,m,p]}} \odot {C_{[n,m,p]}}\]
where \(\odot\) represents elementwise multiplication operation (Hadamard product) of tensors, with the corresponding MATLAB operation .*
.
Let's now try to find transformation \(f_a\) that transforms vector \(a_{[n]}\) to tensor \(A_{[n,m,p]}\). If we look at the original equation, we see that the output tensor elements \(t_{i,j,k}\) depend on \(a_i\) element of the input vector \(a_{[n]}\), for any value of j and k indexes. In other words, no matter the actual value of j and k indexes, we are always using the same value \(a_i\). It means that our newly created tensor \(A_{[n,m,p]}\) should be such that its elements depend only on i (the first dimension), and are the same along j and k (the second and third dimensions). We can easily create such tensor by simply aligning the vector \(a_{[n]}\) along its first dimension and then copy everything in the second and third direction. We can accomplish that in MATLAB in a single line of code:
A = repmat(a, [1, m, p]);
As you can see, I've used repmat
function to accomplish that. You can read more about the function in the official documentation. Here I will just interpret the function call in human language: Take vector a
, replicate it 1
time in the direction of the first dimension, m
times in the direction of the second dimension, and p
times in the direction of the third dimension. And this is exactly what we've actually wanted to accomplish.
By following the same logic, we may transform vectors b and c to tensors B and C, respectively. We may conclude that tensor B should be such that elements of vector b are aligned along its second dimension, and then copied in the first and third directions. Similarly, tensor C should be such that elements of vector c are aligned along its third dimension and then copied in the first and second directions. But there's one more thing that we should take care of in the implementation: vectors a, b and c are all assumed to be column vectors (oriented in the direction of the first dimension), and it was OK in the case of vector a because we wanted to align it along the first direction, but it isn't OK when it comes to vectors b and c. We want vector b to be oriented in the direction of the second dimension and c in the direction of the third dimension. For this reason, before calling repmat
function, we need to reshape these vectors, and we'll do that by using permute
function.
Based on everything said in this section, we may create a new, loopless implementation as follows (GitHub):
function T = example1Loopless(a, b, c)
%EXAMPLE1LOOPLESS Calculates the tensor without using loops.
% Input arguments a, b and c are column vectors.
% Get dimensions:
n = length(a);
m = length(b);
p = length(c);
% Transform a to A:
A = repmat(a, [1, m, p]);
% Transform b to B
% Reshape b:
b = permute(b, [2, 1]);
% Now we can do repmat:
B = repmat(b, [n, 1, p]);
% Transform c to C
% Reshape c:
c = permute(c, [2, 3, 1]);
% And now repmat:
C = repmat(c, [n, m, 1]);
% And simply calculate the result:
T = A .* B .* C;
end
Now both implementations (with and without loops) will produce the same result, and I'll show that in a test, but before that we need to consider one more thing: memory consumption.
Memory Consumption
Although our loopless implementation is faster than the one with loops, it suffers another problem: it consumes more memory. If the input vectors (a, b, and c) are big, each of the tensors created (A, B, C, and T) will consume significant memory. For example, if all the input vectors have 1000 elements, every tensor will have one billion elements, and if every element takes 8 bytes, the tensors will take about 8GB each. And since we're creating four tensors, our loopless implementation will consume about 32GB of memory. On the other side, in implementation based on loops, we're creating only one tensor, meaning that memory consumption will be much lower (about 8GB).
We can make things much better if we avoid creating tensors A
, B
, and C
in advance and keeping them in memory. Instead, we can create them onthefly when they are needed. So let's refactor the previous code to accomplish that (GitHub):
function T = example1Final(a, b, c)
%EXAMPLE1FINAL Calculates the tensor without using loops.
% Input arguments a, b and c are column vectors.
% Get dimensions:
n = length(a);
m = length(b);
p = length(c);
% Reshape the input vectors:
b = permute(b, [2, 1]);
c = permute(c, [2, 3, 1]);
% Calculate:
T = repmat(a, [1, m, p]);
T = T .* repmat(b, [n, 1, p]);
T = T .* repmat(c, [n, m, 1]);
end
Compare the Results
For comparing the results I've created example1CompareResults
function (GitHub), which simply creates three vectors a, b, and c with given dimensions and random elements, calculates the result by using both implementations (with and without loops) and measures the execution times, and finally checks if the results are equal. The input arguments of the function are desired lengths of vectors a, b, and c, respectively. Here are the outputs of several executions:
>> example1CompareResults(10, 11, 12);
Execute loopless implementation:
Elapsed time is 0.000329 seconds.
Execute implementation with loops:
Elapsed time is 0.000183 seconds.
The results are the same!
>> example1CompareResults(200, 201, 202);
Execute loopless implementation:
Elapsed time is 0.041648 seconds.
Execute implementation with loops:
Elapsed time is 0.092004 seconds.
The results are the same!
>> example1CompareResults(500, 501, 502);
Execute loopless implementation:
Elapsed time is 0.492632 seconds.
Execute implementation with loops:
Elapsed time is 4.736423 seconds.
The results are the same!
>> example1CompareResults(1000, 1001, 1002);
Execute loopless implementation:
Elapsed time is 4.021327 seconds.
Execute implementation with loops:
Elapsed time is 50.138179 seconds.
The results are the same!
Note that, for a small number of elements (the first execution with vector a of length 10, vector b of length 11, and vector c of length 12), implementation with loops is actually faster. But with a larger number of elements, the loopless implementation is significantly faster than one with loops. But keep in mind that even the loopless implementation will become slow if the number of elements is bigger than the available RAM can afford and if the OS starts using swap files.
Using the Library
Everything said so far explains how tensorHadamard
function from the MATLAB Tensors library works. tensorHadamard
function is actually a generalized form of example1Final
function we've created here. It allows us to have a different number of input arguments (not only vectors a, b, and c), the input arguments don't have to be vectors, but they also can be matrices or tensors of any order, the output result also can be a tensor of any order (size of the output depends on sizes of the inputs, of course), etc.
To recap: if we have three vectors \(a_{[n]}\), \(b_{[m]}\), and \(c_{[p]}\), and we need to create tensor \(T_{[n,m,p]}\) whose elements are defined as (same as above):
\[\Large T_{[m*n*p]} \to t_{i,j,k}=a_i*b_j*c_k\]
we can accomplish that in a single line of code in MATLAB (by using MATLAB Tensors library):
T = tensorHadamard(3, a, 1, b, 2, c, 3);
Let's quickly explain the input arguments:

The first argument (named
tSize
),3
in our case, defines the size of the output tensor. By specifying scalar value 3 we've actually said: "the output should be a tensor of order 3." 
The rest of the arguments (variable number of them) are coming in pairs (named
a, dimA
), and representing the input values. The first argument in the pair specifies the value, while the second specifies where the value belongs. In our example:
Pair
a, 1
represents the first input value, wherea
is the value (vector a in our case), while1
defines that it should be aligned along the first dimension (it belongs there). 
Pair
b, 2
represents the second input value, whereb
is the value, while2
defines that it should be aligned along the second dimension. 
Pair
c, 3
represents the third input value, wherec
is the value, while3
defines that it should be aligned along the third dimension.

Pair
You can find more details about the input arguments in the documentation, and you should do that because there you'll find more details about tSize
argument and different options for specifying it, the relation between a
and dimA
(pair of arguments that defines an input value and where it belongs), the default value used for dimA
(if it isn't specified or if it's empty), etc.
Extending the Applicable Problems Scope
If the only purpose of tensorHadamard
function was to solve the problem we were dealing with so far here (creating tensor by multiplying elements of three vectors), it wouldn't be too useful. Here I want to demonstrate that it actually can be used in many other scenarios. In fact, whenever you have a problem where you need to calculate some set of values (the resulting tensor) based on values contained in some input vectors/matrices/tensors, by looping through elements and calculating results onebyone, you can use tensorHadamard
instead to do this more efficiently (by writing less code which will execute faster).
Division
Let's start with a trivial problem: how can I use tensorHadamard
if besides multiplication I also need division? Assuming that we're starting from the same input vectors a, b, and c, can we create a tensor which will satisfy the following:
\[\Large T_{[m*n*p]} \to t_{i,j,k} = \frac{a_i*b_j}{c_k}\]
Here we can replace the division with multiplication simply by replacing vector \(c_{[p]}\) with vector \(c'_{[p]}\), gotten in the following way:
\[\Large {c'_{[p]}} = {c_{[p]}}^{ \circ (  1)}\]
The weird operator in the exponent in the equation above, represented by a circle similar to the degrees circle, actually represents elementwise power (or Hadamard power), and it means that power operation should not be performed on the whole matrix but on each of its elements. The corresponding MATLAB operator is .^
.
Once we have the replacement vector, we can write the problem in the form we are familiar with:
\[\Large T_{[m*n*p]} \to t_{i,j,k}=a_i*b_j*c'_k\]
Let's put all that into MATLAB code:
T = tensorHadamard(3, a, 1, b, 2, c .^ 1, 3);
That's it. We can use the same trick even if the input value isn't vector but matrix or tensor, of course.
Final Demo
To demonstrate how tensorHadamard
can be used with other operations (addition and subtraction), how it can be used not only with matrices but also with vectors, and to demonstrate commutativity and associativity of tensor Hadamard product, I will actually solve the ugly problem we've started with:
\[\frac{{\partial }^2v_G}{{\left(\partial m\right)}^2}\ [n*n*n]\to \frac{{\partial }^2v_{Gi}}{\partial m_j\partial m_k}=\frac{{\overline{r}}_i}{{\mu }^2_{Fi}}\frac{\partial {\mu }_{Fi}}{\partial m_j}\frac{\partial {\mu }_{Fi}}{\partial m_k}\left(\frac{v_j}{{\overline{r}}_j}+\frac{2}{R^2_G}\frac{x_i}{{\mu }_{Fi}}\frac{v_k}{{\overline{r}}_k}\right)\]
I've already said that it looks scary mostly because of ugly notation, so let's change that. After only changing the names of the variables, I've got the following:
\[\Large T_{[n*n*n]}\to t_{i,j,k}=\frac{r_i}{m^2_i}*D_{i,j}*D_{i,k}*\left(\frac{v_j}{r_j}+\frac{2}{R^2}\frac{x_i}{m_i}\frac{v_k}{r_k}\right)\]
It's a bit better now, right? So there we have output tensor \(T_{[n,n,n]}\), input scalar \(R\), input matrix \(D_{[n,n]}\), and several input vectors: \(r_{[n]}\), \(m_{[n]}\), \(v_{[n]}\), and \(x_{[n]}\). The corresponding MATLAB variables that I'll consider given and known are: R
, D
, r
, m
, v
, and x
.
Here I will simply provide two possible solutions, both giving the same, correct result, without too much explanation. Keep in mind though, that in order to understand the code, besides knowing things explained in this document, you also need to understand a few things from the documentation, specifically:

Different ways for specifying
tSize
argument. In the following code, I'm sometimes specifying it as scalar value3
, and sometimes (when I have to) as vector value[n,n,n]
. 
Valid
dimA
values when the input value isn't a vector (when it's a matrix). In short: if the input valuea
is a vector,dimA
should be a scalar, and we've already seen that. But when the input value is a matrix, thendimA
should be a vector of two elements. I'm sure that, knowing the purpose of this argument, you'll understand why is that.
Solution 1
% Start from everything in front of the parenthesis:
T = tensorHadamard(3, r ./ m .^ 2, 1, D, [1,2], D, [1,3]);
% Precalculate v ./ r because it appears twice:
vr = v ./ r;
% Finalize:
T =  T .* tensorHadamard([n,n,n], vr, 2) + 2 / R^2 * T .* tensorHadamard([n,n,n], x ./ m, 1)  T .* tensorHadamard([n,n,n], vr, 3);
Solution 2
% Precalculate v ./ r because it appears twice:
vr = v ./ r;
% Start from everything within the parenthesis:
T =  tensorHadamard([n,n,n], vr, 2) + 2 / R^2 * tensorHadamard([n,n,n], x ./ m, 1)  tensorHadamard([n,n,n], vr, 3);
% Finalize:
T = T .* tensorHadamard(3, r ./ m .^ 2, 1, D, [1,2], D, [1,3]);
Add new comment
Anonymous comments require solving captcha, and waiting for administrator's approval.