I am trying to implement an NER tagger and I’m stuck in the implementation of loss function. I’m using cross entropy loss.
The y_pred size is = torch.Size([2, 49, 9])
The y_batch size is = torch.Size([2, 49])
# Config
BATCH_SIZE = 2
EMBEDDING_SIZE = 5
VOCAB_SIZE = len(word2idx)
TARGET_SIZE = len(tag2idx) # number of output tags is 9
HIDDEN_SIZE = 4
STACKED_LAYERS = 3
length of sentences in the batch = 49
Sample IP/OP:
Sentence: Ronaldo is from portugal.
Tags: PER O O LOC
loss(y_pred, y_batch)
Error:
ValueError: Expected target size (2, 9), got torch.Size([2, 49])
I know that CrossEntropy requires y_pred to be of size (N,C) and the y_batch to be of size (N) where N is the batch size and C is the number of classes.
The issue is, I have 9 probabilities per word here. So, I need to calculate the loss for each of them. That’s where I’m tripping up. How do I go about implementing the loss function for this?