Training accuracy (loss) increases (decreases) in a zigzag way?


My training log is very strange that first batch always performs better then later batches in an epoch.

Here is my training log

As you see, they decrease (increase) in a very typical zigzag way, which makes me puzzled a lot.

I’ve shuffled the training data:

train_loader =
    train_dataset,, shuffle=True,
    num_workers=12, pin_memory=True

So why do things like this happen, any tips?


Do you plot the loss of the current batch or are you somehow summing / averaging it?
Could you post the code regarding the accuracy and loss calculation?


The complete code is here

Actually I do plot the averaged loss/precision.
But it still makes me confused why the first batch always performs the best.


Especially since you are shuffling the data.
I couldn’t find any issues by skimming through your code.
Do you see the same effect by just storing the batch losses (without AverageMeter)?