nn.Linear(vocab_size, num_labels)
means that the matrix shape is num_labels x vocab_size
bow_vector dimensions is 1 x vocab_size and input expected for nn.linear is batch_size x features
Now, we are multiplying num_labels x vocab_size matrix by 1 x vocab_size. Thus, the dimensions don’t match for matrix multiplication. What am I missing here?
When you apply a Linear to a tensor, you are not exactly (left)
multiplying the Linear's weight matrix onto the input tensor.
Rather, you right-multiplying the input tensor by the transpose
of the weight matrix. Thus the matrix-multiplication dimensions
match up properly.
Here is an illustrative script:
import torch
torch.__version__
torch.manual_seed (2020)
lin = torch.nn.Linear (3, 5, bias = False)
inp = torch.autograd.Variable (torch.randn (2, 3))
lin
lin.weight
lin (inp)
inp.matmul (lin.weight.transpose (0, 1))