Training accuracy (loss) increases (decreases) in a zigzag way?


(KAI ZHAO) #1

My training log is very strange that first batch always performs better then later batches in an epoch.

Here is my training log

As you see, they decrease (increase) in a very typical zigzag way, which makes me puzzled a lot.

I’ve shuffled the training data:

train_loader = torch.utils.data.DataLoader(
    train_dataset, batch_size=args.bs, shuffle=True,
    num_workers=12, pin_memory=True
)

So why do things like this happen, any tips?


#2

Do you plot the loss of the current batch or are you somehow summing / averaging it?
Could you post the code regarding the accuracy and loss calculation?


(KAI ZHAO) #3

The complete code is here https://gist.github.com/zeakey/9d1c313329a7ea32ea12ae0f3a8db09f.

Actually I do plot the averaged loss/precision.
But it still makes me confused why the first batch always performs the best.


#4

Especially since you are shuffling the data.
I couldn’t find any issues by skimming through your code.
Do you see the same effect by just storing the batch losses (without AverageMeter)?


(KAI ZHAO) #5

I just removed the AverageMeter and store the instant values. Here is the complete code: https://gist.github.com/zeakey/9d1c313329a7ea32ea12ae0f3a8db09f#file-train_center_exclusve_filter-py.

Strangely still the loss (and top1 accuracy) vibrates in a zigzag manner, the first batch always evaluates the highest accuracy and lowest loss.


#6

Thanks for the code! I’ll have a look at it.
Which script is behaving strangely, train_exclusive.py or train_center_exclusve_filter.py?

Could you tell me which resolution your dataset (casia) has? I would use a random dataset first and check, if there are some obvious mistakes.

Are the default values for the other arguments set, such that the strange behavior is raised?


(Peter Dekkers) #7

Didn’t go through your code, but in general a possible cause for this type behaviour could be a feature set that isn’t normalised properly.

For example one of the features isn’t normalised at all and has for example values between 0 and 255. This causes the changes to the weights by the optimiser to overshoot when applied to this feature and it is not going in a more straight line to the optimum (hence the zigzag).

I assume at around 8000 iterations you reduce the learning rate and the changes to the weights are reduced and so is the “overshoot” factor (so smaller zigzag).


(KAI ZHAO) #8

both of train_exclusive.py and train_center_exclusve_filter.py behave strangely.

In the later one I removed the AverageMeter and store the instant values.

The figure of logs is here http://data.kaiz.xyz/data/exloss_record1.pdf, where the ExLoss typically decrease in a zigzag way.

Though the model finally converged and seems works well, but I just want to know what caused the strange behaviors.


(Deeply) #9

I am having the same zig-zag-ed loss values in one of my works.
In that work, I am dealing with multi-label (multiple hot encoder) of size 600.
I think this rather large multi-hot binary output is the reason for this zig-zag-ness.

So, the large variations in the input/output could be the reason for this phenomenon. I would not worry about it as long as the model is achieving convergence.


(KAI ZHAO) #10

Thanks for your information.

Actually my model converged as well, but I still want to know what causes the strange behavior.