I am training my model on a 3D dataset consisting of 100 data. The batch size I use is 1 (I cannot use a larger batch size). Although I use shuffle=True in dataloader, when I test my model, my model overfits to the first batch (i.e. data sample). So the test accuracy of the first batch in the test set is considerably higher than the rest. It seems my training data is overfitting to the first data sample. Any help or thoughts is appreciated.
Hard to tell without data an details of what you are doing. However, this doesn’t seem like overfitting to me, it sounds more like your model is not converging. This could be due to poor hyper-parameter choice such as the learning rate, or some other error.
Yes, random weights and uniform. I did try lowering the learning rate but it did not change anything. I just tried changing weight decay to different values.
I have a shape correspondence task with the FAUST dataset (a total of 100 shapes). batch size is one (almost all models use this batch size and I cannot increase it). I initially use a U-net and two graph CNN layer in tandem (U-net is for learning point-wise features). I think it does converge as loss decrease and I stop at some point. here is the accuracy of each batch size in test set: