How can i check if my transformer blocks are working in my classifier? (with Colab Link)

Hey,

im currently working on a classification transformer and using the imdb sentiment dataset.
And I am pretty unsure that my model is working because when i remove the transformer blocks the loss is only 8% higher as with them.

I am using the pretrained 100 dim glove word embeeding vectors. Thats why my hidden dim is also only 100.

model specs:

INPUT_DIM = 20000
HID_DIM = 100
OUT_DIM = 1
N_LAYERS = 2
N_HEADS = 4
FF_DIM = 32
MAXLEN = 100

optimizer = torch.optim.Adam(model.parameters())
criterion = nn.BCEWithLogitsLoss()

Google Colab Link:

Can someone please tell me how to check if these attention algorithms are working?

Hi,

as what I’ve seen, your MultiHeadAttention and EncoderLayer implementation should work. Could you show a continuous representation of both losses, please?

Regards,
Unity05

For that i need to train the model again. I can send you the losses later.
But i forgot to mention that the model with transformer blocks seems to overfitt cause the validation_acc was in every 5 epochs about 80%. I think thats because i used the pretty small dim of the pretrained, glove word vectors (100) also as hid_dim.

I’ve just seen that you only stack two of the encoder layers what might be too few. Talking about your hidden dimension, shouldn’t it be hid_dim = embed_dim as using different dimensions would complicate it. You could also try 300d for instance.

I removed the emb_dim and changed to the 300d pretrained word embeddings.

INPUT_DIM = 20002
HID_DIM = 300
OUT_DIM = 1
N_LAYERS = 4
N_HEADS = 6
FF_DIM = HID_DIM * 2
MAXLEN = 100

Epoch: 01 | Epoch Time: 1m 15s
Train Loss: 0.554 | Train Acc: 71.35%
Val. Loss: 0.467 | Val. Acc: 77.56%
Epoch: 02 | Epoch Time: 1m 15s
Train Loss: 0.402 | Train Acc: 81.82%
Val. Loss: 0.457 | Val. Acc: 79.22%
Epoch: 03 | Epoch Time: 1m 15s
Train Loss: 0.359 | Train Acc: 84.33%
Val. Loss: 0.526 | Val. Acc: 74.28%

training results from old config (with transformerblocks)

INPUT_DIM = 20002
HID_DIM = 300
OUT_DIM = 1
N_LAYERS = 2
N_HEADS = 2
FF_DIM = 128
MAXLEN = 100

Epoch: 01 | Epoch Time: 0m 30s
Train Loss: 0.464 | Train Acc: 77.80%
Val. Loss: 0.421 | Val. Acc: 81.38%
Epoch: 02 | Epoch Time: 0m 30s
Train Loss: 0.314 | Train Acc: 86.78%
Val. Loss: 0.442 | Val. Acc: 80.23%
Epoch: 03 | Epoch Time: 0m 30s
Train Loss: 0.227 | Train Acc: 91.24%
Val. Loss: 0.524 | Val. Acc: 79.14%
Epoch: 04 | Epoch Time: 0m 30s
Train Loss: 0.159 | Train Acc: 94.36%
Val. Loss: 0.643 | Val. Acc: 77.77%
Epoch: 05 | Epoch Time: 0m 30s
Train Loss: 0.118 | Train Acc: 95.82%
Val. Loss: 0.622 | Val. Acc: 78.18%

training results from old config (without transformerblocks)

Epoch: 01 | Epoch Time: 0m 4s
Train Loss: 0.607 | Train Acc: 69.78%
Val. Loss: 0.521 | Val. Acc: 76.43%
Epoch: 02 | Epoch Time: 0m 4s
Train Loss: 0.428 | Train Acc: 81.96%
Val. Loss: 0.432 | Val. Acc: 80.48%
Epoch: 03 | Epoch Time: 0m 4s
Train Loss: 0.343 | Train Acc: 85.94%
Val. Loss: 0.409 | Val. Acc: 81.34%
Epoch: 04 | Epoch Time: 0m 4s
Train Loss: 0.292 | Train Acc: 88.55%
Val. Loss: 0.417 | Val. Acc: 80.56%
Epoch: 05 | Epoch Time: 0m 4s
Train Loss: 0.255 | Train Acc: 90.13%
Val. Loss: 0.412 | Val. Acc: 81.23%

I am so happy to see that the issue have been solved. i am looking for the same info.

mymilestonecard