Crossed Element-wise multiplication

I have two tensors. t1 of shape NxD and t2 of shape MxD.

I want to get tensor t3 of shape NxMXD in which t3[n,m,:] = t1[n,:] * t2[m,:], being * the element-wise multiplication.

Is there an efficient way to do this? I can do it with fors but I need a better way to do it.

Thank you very much.

Ben

t3=torch.einsum('nd,md->nmd',t1,t2)

This seems to work!

I wonder if it is implemented efficiently.

Thank you very much!

Ben

I have no idea. I really hope so. For sure it is cuds-paralellized.
@albanD Can you give us any insight about einsum implementation? :slight_smile:

That might depend on size and device.
I’m afraid it might not always be optimal as it is much more general than other ops.
In this case, you can use broadcasting to get the same result:

In [1]: import torch

In [2]: N, M, D = 10, 20, 30

In [3]: t1 = torch.rand(N,D)

In [4]: t2 = torch.rand(M,D)

In [5]: %timeit t3=torch.einsum('nd,md->nmd',t1,t2)
41.2 µs ± 263 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [6]: %timeit t3=torch.einsum('nd,md->nmd',t1,t2)
41.3 µs ± 325 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [7]: %timeit t3=t1.unsqueeze(1) * t2.unsqueeze(0)
18.4 µs ± 130 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [8]: %timeit t3=t1.unsqueeze(1) * t2.unsqueeze(0)
23.1 µs ± 3.72 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [9]: (t1.unsqueeze(1) * t2.unsqueeze(0) - torch.einsum('nd,md->nmd',t1,t2)).abs().max()
Out[9]: tensor(0.)

Note that the timing may vary wildly for different sizes or if you use a GPU.
Also for other op, the einsum might be faster.

Also opt_einsum might improve the einsum performances, but I haven’t tried for this particular example.