Training loss suddenly explodes

jaewoo_song · February 9, 2021, 5:34am

Hi, I’m trying to fine-tune BERT for entity recognition task.

While training, the train loss decreases steadily and evaluation scores increase as I expected, but suddenly at a certain point, the train loss becomes large and all evaluation scores become 0.

I think this might be vanishing/exploding gradient, but I don’t know how to fix this.

I added gradient clipping and learning rate scheduling which reduce the learning rate if the validation score does not decrease.

And I also initialized the weights of output layer for token-level classification into the xavier-uniform.

Here are specific details for my training.

encoder: bert-base-uncased
initialization of output layer: xavier uniform
batch size: 8
starting learning rate: 5e-5
scheduler: ReduceLROnPlateau
- factor: 0.1
- patience: 3 epochs
- threshold: 1e-3
optimizer: AdamW
loss: CrossEntropy
gradient clipping: 1.0

Thank you.