Torch.einsum is around ~4x faster than broadcasting torch.matmul for my use case

Hi Santosh!

I don’t know why einsum() is faster than matmul(), but we have seen things
like this before, for example, as in this post:

(We’ve also seen cases where einsum() is unexpectedly and unreasonably
slow.)

As an aside, I might guess that it would be better to describe this as an
(unexpected) slowdown in matmul(), rather than as a speedup in einsum().
Have you considered comparing the matmul() timings with a loop version?

Best.

K. Frank

1 Like