Why does the Linear module seems to do unnecessary transposing?

I was also thinking about this, and found this issue:

From what i understand, transposing in forward pass has no overhead. But backward pass will be less efficient if

input.matmul(weight)