Performance of Right-Facing vs Left-Facing matmuls

Hi @Neel_Nanda,

Can you share a minimal reproducible example?

  1. When you compare torch.einsum with nn.Linear make sure you have nn.Linear(bias=False) otherwise the operations aren’t equivalent.

  2. When you are measuring times for code snippets, make sure that you synchronize torch (torch.cuda.synchronize) before calling time.time() otherwise you won’t record the whole runtime of an operation but only its call (especially if ran on the GPU). There’s more info about it here.