Can't overfit to training data with ViT

alixbird · October 11, 2023, 4:14am

I’m using a visual transformer on a small medical imaging dataset (1000 images or so).

Usually my first go to sanity check is to overfit to the training data, but I haven’t had any luck in doing so. I have checked the input data, that the labels all look correct, the learning rate etc.

I saw in another forum that someone had said that you shouldn’t expect to be able to overfit as ViT’s require so many more magnitudes of data.

I’m not looking for advice on how to troubleshoot it not fitting to the training data, but whether or not I should expect it to be able to.

TIA

ptrblck · October 11, 2023, 4:37am

It’s news to me that a model requires a lot of data to overfit as the simplest use case is to overfit any model on a single sample.
What is the explanation for this alleged effect?

alixbird · October 11, 2023, 5:38am

Thanks for your response. There wasn’t a particular explanation, and it all sounded a bit vague. It sounded strange to me too so I think I will persist and try with a small simpler dataset to see if I can get it working.

Thanks a lot