How to solve the spikes of loss at some epoch?

John1231983 · October 19, 2019, 4:00am

During training, I found some spikes of loss at epoch as follows

Based on the loss of previous loss, if the loss different between current and previous too big, I will not update the gradient. Do we have any function in pytorch do check implement it? Currently, I used as


current_loss = cross_entropy(input, target)
if current_loss- prev_loss> threshold):
   current_loss.backward()
   prev_loss = current_loss

Tony-Y · October 19, 2019, 5:13am

Exploding and Vanishing Gradients

If you suffer from exploding gradients, I recommend using clip_grad_norm_.

John1231983 · October 19, 2019, 5:15am

@Tony-Y: Thanks but how to know the low and up bound of grad to be input for the function

Tony-Y · October 19, 2019, 5:54am

One good heuristic for setting this threshold is to look at statistics on the average norm over a sufficiently large number of updates. In our experiments we have noticed that for a given task and model size, training is not very sensitive to this hyper-parameter and the algorithm behaves well even for rather small thresholds.