Overfitting to first batch


I am training my model on a 3D dataset consisting of 100 data. The batch size I use is 1 (I cannot use a larger batch size). Although I use shuffle=True in dataloader, when I test my model, my model overfits to the first batch (i.e. data sample). So the test accuracy of the first batch in the test set is considerably higher than the rest. It seems my training data is overfitting to the first data sample. Any help or thoughts is appreciated.

It is difficult to debug without see the data, but here are some things you can check:

  1. Are you initializing with random weights?
  2. Have you tried lowering your learning rate?

Hard to tell without data an details of what you are doing. However, this doesn’t seem like overfitting to me, it sounds more like your model is not converging. This could be due to poor hyper-parameter choice such as the learning rate, or some other error.

Yes, random weights and uniform. I did try lowering the learning rate but it did not change anything. I just tried changing weight decay to different values.

I have a shape correspondence task with the FAUST dataset (a total of 100 shapes). batch size is one (almost all models use this batch size and I cannot increase it). I initially use a U-net and two graph CNN layer in tandem (U-net is for learning point-wise features). I think it does converge as loss decrease and I stop at some point. here is the accuracy of each batch size in test set:

accuracy: 0.7264
accuracy: 0.6181
accuracy: 0.5443
accuracy: 0.5376
accuracy: 0.5251
accuracy: 0.5210
accuracy: 0.5014
accuracy: 0.4616
accuracy: 0.4720
accuracy: 0.4741
accuracy: 0.4778
accuracy: 0.4814
accuracy: 0.4782
accuracy: 0.4676
accuracy: 0.4705
accuracy: 0.4721
accuracy: 0.4744
accuracy: 0.4756
accuracy: 0.4768
accuracy: 0.4784