Loss function format for sequence (NER/POS) tagging

I am trying to implement an NER tagger and I’m stuck in the implementation of loss function. I’m using cross entropy loss.

The y_pred size is = torch.Size([2, 49, 9])
The y_batch size is = torch.Size([2, 49])

# Config
BATCH_SIZE = 2
EMBEDDING_SIZE = 5
VOCAB_SIZE = len(word2idx)
TARGET_SIZE = len(tag2idx) # number of output tags is 9
HIDDEN_SIZE = 4
STACKED_LAYERS = 3

length of sentences in the batch = 49

Sample IP/OP:

Sentence: Ronaldo is from portugal. 
Tags:      PER     O    O     LOC
loss(y_pred, y_batch)

Error:

ValueError: Expected target size (2, 9), got torch.Size([2, 49])

I know that CrossEntropy requires y_pred to be of size (N,C) and the y_batch to be of size (N) where N is the batch size and C is the number of classes.

The issue is, I have 9 probabilities per word here. So, I need to calculate the loss for each of them. That’s where I’m tripping up. How do I go about implementing the loss function for this?

Try to permit y_pred so that you pass it as [2, 9, 49].

I changed the shape of y_pred to [2, 9, 49] using y_pred.view(BATCH_SIZE, TARGET_SIZE, -1).

I then did loss(y_pred.view(BATCH_SIZE, TARGET_SIZE, -1), y_batch). The network started training but my loss kept increasing.

Here’s the training loop:

# Train loop
gru_model.train()
for e in range(1, EPOCHS+1):
    epoch_loss = 0
    epoch_acc = 0
    for batch in train_loader:
        x_batch, y_batch = map(list, zip(*batch))
        x_batch = [torch.tensor(i).to(device) for i in x_batch]
        y_batch = [torch.tensor(i).to(device) for i in y_batch]

        y_batch = pad_sequence(y_batch, batch_first=True)
        
        y_pred = gru_model(x_batch)
                               
        loss = criterion(y_pred.view(BATCH_SIZE, TARGET_SIZE, -1), y_batch)
        
        loss.backward()
        optimizer.step()
        
        epoch_loss += loss.item()
        
    print(f'Epoch {e+0:03}: | Loss: {epoch_loss/len(train_loader):.5f}')



# Output
'''
Epoch 001: | Loss: 2.11514
Epoch 002: | Loss: 2.12977
Epoch 003: | Loss: 2.16030
Epoch 004: | Loss: 2.17899
Epoch 005: | Loss: 2.17955
Epoch 006: | Loss: 2.18188
Epoch 007: | Loss: 2.19973
Epoch 008: | Loss: 2.19941
Epoch 009: | Loss: 2.20499
Epoch 010: | Loss: 2.19535 
'''

There was one other issue which I faced - when I change my batch_size=64, stacked_layers=4, hidden_size=8, and embedding_size=128 to bigger numbers; I get the following error.


---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-62-c943c869e71b> in <module>
     20 #         print(y_pred.view(BATCH_SIZE, TARGET_SIZE, -1))
     21 
---> 22         loss = criterion(y_pred.view(BATCH_SIZE, TARGET_SIZE, -1), y_batch)
     23 
     24         loss.backward()

RuntimeError: shape '[64, 9, -1]' is invalid for input of size 108900

Please tell me if you need any additional information/code.

Sorry for the typo, but apparently autocorrect “corrected” permute to permit. :confused:

Could you .permute the dimensions, since view will not yield the desired results:

x = torch.arange(2*3*4).view(2, 3, 4)
y = x.view(2, 4, 3)
z = x.permute(0, 2, 1)

print(x)
> tensor([[[ 0,  1,  2,  3],
         [ 4,  5,  6,  7],
         [ 8,  9, 10, 11]],

        [[12, 13, 14, 15],
         [16, 17, 18, 19],
         [20, 21, 22, 23]]])
print(y)
> tensor([[[ 0,  1,  2],
         [ 3,  4,  5],
         [ 6,  7,  8],
         [ 9, 10, 11]],

        [[12, 13, 14],
         [15, 16, 17],
         [18, 19, 20],
         [21, 22, 23]]])
print(z)
> tensor([[[ 0,  4,  8],
         [ 1,  5,  9],
         [ 2,  6, 10],
         [ 3,  7, 11]],

        [[12, 16, 20],
         [13, 17, 21],
         [14, 18, 22],
         [15, 19, 23]]])
1 Like

I tried out permute. Did not know something like that existed. Everything works perfectly now.
I changed the shape of y_pred to [2, 9, 49] using y_pred.permute(0, 2, 1).

Thank you!

1 Like