My training log is very strange that first batch always performs better then later batches in an epoch.
Here is my training log
As you see, they decrease (increase) in a very typical zigzag way, which makes me puzzled a lot.
I’ve shuffled the training data:
train_loader = torch.utils.data.DataLoader( train_dataset, batch_size=args.bs, shuffle=True, num_workers=12, pin_memory=True )
So why do things like this happen, any tips?