I don’t know why einsum() is faster than matmul(), but we have seen things
like this before, for example, as in this post:
(We’ve also seen cases where einsum() is unexpectedly and unreasonably
slow.)
As an aside, I might guess that it would be better to describe this as an
(unexpected) slowdown in matmul(), rather than as a speedup in einsum().
Have you considered comparing the matmul() timings with a loop version?