Training accuracy (loss) increases (decreases) in a zigzag way?


(KAI ZHAO) #1

My training log is very strange that first batch always performs better then later batches in an epoch.

Here is my training log

As you see, they decrease (increase) in a very typical zigzag way, which makes me puzzled a lot.

I’ve shuffled the training data:

train_loader = torch.utils.data.DataLoader(
    train_dataset, batch_size=args.bs, shuffle=True,
    num_workers=12, pin_memory=True
)

So why do things like this happen, any tips?


#2

Do you plot the loss of the current batch or are you somehow summing / averaging it?
Could you post the code regarding the accuracy and loss calculation?


(KAI ZHAO) #3

The complete code is here https://gist.github.com/zeakey/9d1c313329a7ea32ea12ae0f3a8db09f.

Actually I do plot the averaged loss/precision.
But it still makes me confused why the first batch always performs the best.


#4

Especially since you are shuffling the data.
I couldn’t find any issues by skimming through your code.
Do you see the same effect by just storing the batch losses (without AverageMeter)?