I’m using a visual transformer on a small medical imaging dataset (1000 images or so).
Usually my first go to sanity check is to overfit to the training data, but I haven’t had any luck in doing so. I have checked the input data, that the labels all look correct, the learning rate etc.
I saw in another forum that someone had said that you shouldn’t expect to be able to overfit as ViT’s require so many more magnitudes of data.
I’m not looking for advice on how to troubleshoot it not fitting to the training data, but whether or not I should expect it to be able to.