Training accuracy (loss) increases (decreases) in a zigzag way?

zeakey · July 13, 2018, 3:05pm

My training log is very strange that first batch always performs better then later batches in an epoch.

Here is my training log

training loss
TIM%E5%9B%BE%E7%89%8720180713225707625×712 7.33 KB
training accuracy
TIM%E5%9B%BE%E7%89%8720180713225735646×713 8.14 KB

As you see, they decrease (increase) in a very typical zigzag way, which makes me puzzled a lot.

I’ve shuffled the training data:

train_loader = torch.utils.data.DataLoader(
    train_dataset, batch_size=args.bs, shuffle=True,
    num_workers=12, pin_memory=True
)

So why do things like this happen, any tips?

ptrblck · July 13, 2018, 3:50pm

Do you plot the loss of the current batch or are you somehow summing / averaging it?
Could you post the code regarding the accuracy and loss calculation?

zeakey · July 13, 2018, 3:55pm

The complete code is here https://gist.github.com/zeakey/9d1c313329a7ea32ea12ae0f3a8db09f.

Actually I do plot the averaged loss/precision.
But it still makes me confused why the first batch always performs the best.

ptrblck · July 13, 2018, 6:57pm

Especially since you are shuffling the data.
I couldn’t find any issues by skimming through your code.
Do you see the same effect by just storing the batch losses (without AverageMeter)?

zeakey · July 23, 2018, 9:04am

I just removed the AverageMeter and store the instant values. Here is the complete code: https://gist.github.com/zeakey/9d1c313329a7ea32ea12ae0f3a8db09f#file-train_center_exclusve_filter-py.

Strangely still the loss (and top1 accuracy) vibrates in a zigzag manner, the first batch always evaluates the highest accuracy and lowest loss.

ptrblck · July 23, 2018, 9:13am

Thanks for the code! I’ll have a look at it.
Which script is behaving strangely, train_exclusive.py or train_center_exclusve_filter.py?

Could you tell me which resolution your dataset (casia) has? I would use a random dataset first and check, if there are some obvious mistakes.

Are the default values for the other arguments set, such that the strange behavior is raised?

peter · July 23, 2018, 12:45pm

Didn’t go through your code, but in general a possible cause for this type behaviour could be a feature set that isn’t normalised properly.

For example one of the features isn’t normalised at all and has for example values between 0 and 255. This causes the changes to the weights by the optimiser to overshoot when applied to this feature and it is not going in a more straight line to the optimum (hence the zigzag).

I assume at around 8000 iterations you reduce the learning rate and the changes to the weights are reduced and so is the “overshoot” factor (so smaller zigzag).

zeakey · July 23, 2018, 2:03pm

both of train_exclusive.py and train_center_exclusve_filter.py behave strangely.

In the later one I removed the AverageMeter and store the instant values.

The figure of logs is here http://data.kaiz.xyz/data/exloss_record1.pdf, where the ExLoss typically decrease in a zigzag way.

Though the model finally converged and seems works well, but I just want to know what caused the strange behaviors.

Deeply · July 23, 2018, 4:11pm

I am having the same zig-zag-ed loss values in one of my works.
In that work, I am dealing with multi-label (multiple hot encoder) of size 600.
I think this rather large multi-hot binary output is the reason for this zig-zag-ness.

So, the large variations in the input/output could be the reason for this phenomenon. I would not worry about it as long as the model is achieving convergence.

zeakey · July 23, 2018, 4:30pm

Thanks for your information.

Actually my model converged as well, but I still want to know what causes the strange behavior.

kevinj22 · December 19, 2018, 1:54pm

There’s a similar post here Strange behavior with SGD momentum training which also shows a saw toothed loss. A suggestion by smth is to do sampling with replacement.