Hi Jean Paul!

Here’s the best I could come up with for a loop-free version:

At the cost of materializing a large, nine-dimensional “contraction-kernel”

tensor, you can replace the six for-loops with a single `einsum()`

call.

Here is a script that implements this:

```
import torch
print (torch.__version__)
_ = torch.manual_seed (2023)
num_faces = 10
max_m = 5
C2 = torch.rand(num_faces,max_m,max_m,max_m)
C3 = torch.rand(num_faces,max_m,max_m,max_m)
# This is the naive slow implementation with 6 loops, but is correct
D_abc_correct = torch.zeros(num_faces,max_m,max_m,max_m)
for a in range(max_m):
for b in range(max_m):
for c in range(max_m):
for i2 in range(a+1):
for j2 in range(b+1):
for k2 in range(c+1):
D_abc_correct[:,a,b,c] += C2[:,i2,j2,k2]*C3[:,a-i2,b-j2,c-k2]
# set up "contraction kernel"
inds = torch.arange (max_m)
kernel = ((inds.unsqueeze (0) + inds.unsqueeze (1)).unsqueeze (0) == inds.unsqueeze (-1).unsqueeze (-1)).float()
print ('check one slice of kernel:')
print (kernel[2]) # check one slice to see what it looks like
kernelProd = torch.einsum ('ail, bjm, ckn -> abcijklmn', kernel, kernel, kernel) # "outer-product" kernel
print ('kernelProd is big, but rather sparse:')
print ('numel(): ', kernelProd.numel()) # quite big
print ('num-zero: ', (kernelProd == 0.0).sum().item()) # with lots of zeros
print ('frac-nonzero:', (kernelProd != 0.0).sum().item() / kernelProd.numel()) # so rather sparse
kernelProd = kernelProd.unsqueeze (0).expand (num_faces, -1, -1, -1, -1, -1, -1, -1, -1, -1) # expand() view with num_faces dimension
D_abc_B = torch.einsum ('eijk, eabcijklmn, elmn -> eabc', C2, kernelProd, C3)
# check einsum result
print ('torch.allclose (D_abc_correct, D_abc_B):', torch.allclose (D_abc_correct, D_abc_B))
```

And here is its output:

```
2.0.0
check one slice of kernel:
tensor([[0., 0., 1., 0., 0.],
[0., 1., 0., 0., 0.],
[1., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.]])
kernelProd is big, but rather sparse:
numel(): 1953125
num-zero: 1949750
frac-nonzero: 0.001728
torch.allclose (D_abc_correct, D_abc_B): True
```

Note, you can reuse `kernelProd`

, so even if you have to run this

computation multiple times for differing values of the input tensors,

`C2`

and `C3`

, you only have to construct `kernelProd`

once.

Best.

K. Frank