When you apply a Linear to a tensor, you are not exactly (left)
multiplying the Linear's weight matrix onto the input tensor.
Rather, you right-multiplying the input tensor by the transpose
of the weight matrix. Thus the matrix-multiplication dimensions
match up properly.
Here is an illustrative script:
lin = torch.nn.Linear (3, 5, bias = False)
inp = torch.autograd.Variable (torch.randn (2, 3))
inp.matmul (lin.weight.transpose (0, 1))