Hello. I know this question’s been asked quite a lot on a variety of communities but I’m still having trouble grasping it.
I’m currently implementing the continuous bag-of-words (CBOW) model using PyTorch. I’m facing some problems when implementing the cross entropy loss, though. Here’s the portion of code that’s causing the problem:
for idx, sample in enumerate(self.train_data):
x = torch.tensor(sample[0], dtype=torch.long)
y = np.zeros(shape=(self.vocab_size)) # self.vocab_size = 85,000
y[int(sample[1])] = np.float64(1)
y = torch.tensor(y, dtype=torch.long)
if torch.cuda.is_available():
x = x.cuda()
y = y.cuda()
optimizer.zero_grad()
output = self.model(x) # output's shape is the same as self.vocab_size
loss = criterion(output, y)
loss.backward()
optimizer.step()
To briefly explain my code, the model
that I’ve implemented basically outputs the averaged embedding values of a context array and performs a linear projection to project them into a shape that’s identical to the size of the vocabulary. Then we run this array through a softmax function.
The contents of self.train_data
are basically (context, target_word)
pairs. y
is a one-hot encoded array of the token.
I’m aware that the second input to nn.CrossEntropyLoss
is C = # of classes
, but I’m not sure where my code went wrong. The vocabulary size is 85,000 and so aren’t the number of class 85,000?
If I change the input to
loss = criterion(output, 85000)
I get the same error:
*** RuntimeError: Dimension out of range (expected to be in range of [-1, 0], but got 1)
What am I doing wrong, and how should I understand the input to PyTorch’s cross entropy loss?
Thanks.