I changed the shape of y_pred to [2, 9, 49] using y_pred.view(BATCH_SIZE, TARGET_SIZE, -1).
I then did loss(y_pred.view(BATCH_SIZE, TARGET_SIZE, -1), y_batch). The network started training but my loss kept increasing.
Here’s the training loop:
# Train loop
gru_model.train()
for e in range(1, EPOCHS+1):
epoch_loss = 0
epoch_acc = 0
for batch in train_loader:
x_batch, y_batch = map(list, zip(*batch))
x_batch = [torch.tensor(i).to(device) for i in x_batch]
y_batch = [torch.tensor(i).to(device) for i in y_batch]
y_batch = pad_sequence(y_batch, batch_first=True)
y_pred = gru_model(x_batch)
loss = criterion(y_pred.view(BATCH_SIZE, TARGET_SIZE, -1), y_batch)
loss.backward()
optimizer.step()
epoch_loss += loss.item()
print(f'Epoch {e+0:03}: | Loss: {epoch_loss/len(train_loader):.5f}')
# Output
'''
Epoch 001: | Loss: 2.11514
Epoch 002: | Loss: 2.12977
Epoch 003: | Loss: 2.16030
Epoch 004: | Loss: 2.17899
Epoch 005: | Loss: 2.17955
Epoch 006: | Loss: 2.18188
Epoch 007: | Loss: 2.19973
Epoch 008: | Loss: 2.19941
Epoch 009: | Loss: 2.20499
Epoch 010: | Loss: 2.19535
'''
There was one other issue which I faced - when I change my batch_size=64, stacked_layers=4, hidden_size=8, and embedding_size=128 to bigger numbers; I get the following error.
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-62-c943c869e71b> in <module>
20 # print(y_pred.view(BATCH_SIZE, TARGET_SIZE, -1))
21
---> 22 loss = criterion(y_pred.view(BATCH_SIZE, TARGET_SIZE, -1), y_batch)
23
24 loss.backward()
RuntimeError: shape '[64, 9, -1]' is invalid for input of size 108900
Please tell me if you need any additional information/code.