Almost any kind of tensor operation (in particular, tensor contraction) in pytorch is comparable in speed to a matrix multiplication of similar size, so I’m assuming that pytorch is able to express almost any sort of tensor operation in terms of BLAS routines. My goal is to understand how exactly pytorch does that.
I can work out particular cases by hand, but I’m not sure how to generalise that to any arbitarary tensor contraction. Thanks