Crossed Element-wise multiplication

benoriol · February 12, 2020, 1:48pm

I have two tensors. t1 of shape NxD and t2 of shape MxD.

I want to get tensor t3 of shape NxMXD in which t3[n,m,:] = t1[n,:] * t2[m,:], being * the element-wise multiplication.

Is there an efficient way to do this? I can do it with fors but I need a better way to do it.

Thank you very much.

Ben

JuanFMontesinos · February 12, 2020, 1:56pm

t3=torch.einsum('nd,md->nmd',t1,t2)

benoriol · February 12, 2020, 3:49pm

This seems to work!

I wonder if it is implemented efficiently.

Thank you very much!

Ben

JuanFMontesinos · February 12, 2020, 4:01pm

I have no idea. I really hope so. For sure it is cuds-paralellized.
@albanD Can you give us any insight about einsum implementation?

albanD · February 12, 2020, 4:29pm

That might depend on size and device.
I’m afraid it might not always be optimal as it is much more general than other ops.
In this case, you can use broadcasting to get the same result:

In [1]: import torch

In [2]: N, M, D = 10, 20, 30

In [3]: t1 = torch.rand(N,D)

In [4]: t2 = torch.rand(M,D)

In [5]: %timeit t3=torch.einsum('nd,md->nmd',t1,t2)
41.2 µs ± 263 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [6]: %timeit t3=torch.einsum('nd,md->nmd',t1,t2)
41.3 µs ± 325 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [7]: %timeit t3=t1.unsqueeze(1) * t2.unsqueeze(0)
18.4 µs ± 130 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [8]: %timeit t3=t1.unsqueeze(1) * t2.unsqueeze(0)
23.1 µs ± 3.72 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [9]: (t1.unsqueeze(1) * t2.unsqueeze(0) - torch.einsum('nd,md->nmd',t1,t2)).abs().max()
Out[9]: tensor(0.)

Note that the timing may vary wildly for different sizes or if you use a GPU.
Also for other op, the einsum might be faster.

albanD · February 12, 2020, 4:30pm

Also opt_einsum might improve the einsum performances, but I haven’t tried for this particular example.