Can't overfit to training data with ViT

I’m using a visual transformer on a small medical imaging dataset (1000 images or so).

Usually my first go to sanity check is to overfit to the training data, but I haven’t had any luck in doing so. I have checked the input data, that the labels all look correct, the learning rate etc.

I saw in another forum that someone had said that you shouldn’t expect to be able to overfit as ViT’s require so many more magnitudes of data.

I’m not looking for advice on how to troubleshoot it not fitting to the training data, but whether or not I should expect it to be able to.


It’s news to me that a model requires a lot of data to overfit as the simplest use case is to overfit any model on a single sample.
What is the explanation for this alleged effect?

Thanks for your response. There wasn’t a particular explanation, and it all sounded a bit vague. It sounded strange to me too so I think I will persist and try with a small simpler dataset to see if I can get it working.

Thanks a lot