Understanding potential issues with transformers

kirk86 · January 29, 2024, 4:13pm

Hi, I’m trying to understand the whether there’s a problem with my training procedure of transformer encoder.

So, basically I’m trying to train a transformer encoder for classification on synthetic dataset of size [batch_size, vocab_size, dim] generated on the fly where the vocab_size is of varying size.

During training I’ve observed that the loss and accuracy oscillates a lot. Meaning it improves but occasionally hits a hard example then it falls and takes a lot of time to pick up again.

When I plot train and loss values I get plots like the following, since I dunno much about nlp and transformer I was wondering if this is normal or is an indication of something wrong going on?

J_Johnson · January 30, 2024, 3:33am

Have you tried adjusting your learning rate? Also, what have you set your dropout to?

kirk86 · January 30, 2024, 11:17am

Yes, I’ve tried changing the learning rate, also dropout is at 0.1.
But, I’m using also weight decay but really small value 5e-4 and also gradient clipping at 0.5.