Overfitting to first batch

Mohammad_Farazi · March 21, 2022, 12:20pm

Hi,

I am training my model on a 3D dataset consisting of 100 data. The batch size I use is 1 (I cannot use a larger batch size). Although I use shuffle=True in dataloader, when I test my model, my model overfits to the first batch (i.e. data sample). So the test accuracy of the first batch in the test set is considerably higher than the rest. It seems my training data is overfitting to the first data sample. Any help or thoughts is appreciated.

nivek · March 21, 2022, 2:47pm

It is difficult to debug without see the data, but here are some things you can check:

Are you initializing with random weights?
Have you tried lowering your learning rate?

John_Palmer · March 21, 2022, 3:49pm

Hard to tell without data an details of what you are doing. However, this doesn’t seem like overfitting to me, it sounds more like your model is not converging. This could be due to poor hyper-parameter choice such as the learning rate, or some other error.

Mohammad_Farazi · March 21, 2022, 4:48pm

Yes, random weights and uniform. I did try lowering the learning rate but it did not change anything. I just tried changing weight decay to different values.

Mohammad_Farazi · March 21, 2022, 4:52pm

I have a shape correspondence task with the FAUST dataset (a total of 100 shapes). batch size is one (almost all models use this batch size and I cannot increase it). I initially use a U-net and two graph CNN layer in tandem (U-net is for learning point-wise features). I think it does converge as loss decrease and I stop at some point. here is the accuracy of each batch size in test set:

accuracy: 0.7264
accuracy: 0.6181
accuracy: 0.5443
accuracy: 0.5376
accuracy: 0.5251
accuracy: 0.5210
accuracy: 0.5014
accuracy: 0.4616
accuracy: 0.4720
accuracy: 0.4741
accuracy: 0.4778
accuracy: 0.4814
accuracy: 0.4782
accuracy: 0.4676
accuracy: 0.4705
accuracy: 0.4721
accuracy: 0.4744
accuracy: 0.4756
accuracy: 0.4768
accuracy: 0.4784