Plotting losses and accuracy in Cycling learning rate

Mohammed_Awney · September 20, 2019, 12:01pm

How can I plot training and validation losses in cycling learning rate? if I plot it every batch it would be noisy

KFrank · September 20, 2019, 2:12pm

Hi Mohammed!

I’m not sure exactly what you’re asking, but let me make a
few comments that may help address your question.

First, I don’t think “cycling learning rate” really has anything
to do with this. At least my comments would apply equally
well to other training schedules.

If I understand your issue: Averaging the loss over a small
training batch produces a noisy “measurement” of the loss.
This is true.

What I do (when I care about this) is perform a separate
calculation that I don’t use for gradients (or otherwise have
affect my actual training) where I average my loss over the
entire training set (or at least over a subset that is significantly
larger that a batch).

This can be expensive, so, if that’s a problem, you can perform
this big loss calculation after every n batches. For example,
you could do this once every epoch.

As a legitimate expedient, you can store the per-batch losses
and after every n steps, you can average them together, and
plot your n-batch average loss. This makes good sense, but
you have to bear in mind that this is not the loss for a model
with any one specific set of weights. It is, of course, the loss
for one set of weights applied to one batch, a somewhat
different set of weights applied to the next batch, and so on.

(For the validation loss I just calculate the loss for my entire
validation set. That “measurement” still has noise, but, other
than increasing the size of the validation set, you can’t do
anything about it. Also, these comments about the loss
apply equally well to calculating the accuracy.)

Good luck.

K. Frank