Loss function format for sequence (NER/POS) tagging

I changed the shape of y_pred to [2, 9, 49] using y_pred.view(BATCH_SIZE, TARGET_SIZE, -1).

I then did loss(y_pred.view(BATCH_SIZE, TARGET_SIZE, -1), y_batch). The network started training but my loss kept increasing.

Here’s the training loop:

# Train loop
gru_model.train()
for e in range(1, EPOCHS+1):
    epoch_loss = 0
    epoch_acc = 0
    for batch in train_loader:
        x_batch, y_batch = map(list, zip(*batch))
        x_batch = [torch.tensor(i).to(device) for i in x_batch]
        y_batch = [torch.tensor(i).to(device) for i in y_batch]

        y_batch = pad_sequence(y_batch, batch_first=True)
        
        y_pred = gru_model(x_batch)
                               
        loss = criterion(y_pred.view(BATCH_SIZE, TARGET_SIZE, -1), y_batch)
        
        loss.backward()
        optimizer.step()
        
        epoch_loss += loss.item()
        
    print(f'Epoch {e+0:03}: | Loss: {epoch_loss/len(train_loader):.5f}')



# Output
'''
Epoch 001: | Loss: 2.11514
Epoch 002: | Loss: 2.12977
Epoch 003: | Loss: 2.16030
Epoch 004: | Loss: 2.17899
Epoch 005: | Loss: 2.17955
Epoch 006: | Loss: 2.18188
Epoch 007: | Loss: 2.19973
Epoch 008: | Loss: 2.19941
Epoch 009: | Loss: 2.20499
Epoch 010: | Loss: 2.19535 
'''

There was one other issue which I faced - when I change my batch_size=64, stacked_layers=4, hidden_size=8, and embedding_size=128 to bigger numbers; I get the following error.


---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-62-c943c869e71b> in <module>
     20 #         print(y_pred.view(BATCH_SIZE, TARGET_SIZE, -1))
     21 
---> 22         loss = criterion(y_pred.view(BATCH_SIZE, TARGET_SIZE, -1), y_batch)
     23 
     24         loss.backward()

RuntimeError: shape '[64, 9, -1]' is invalid for input of size 108900

Please tell me if you need any additional information/code.