- What is the minimum number of images per folder to get meaningful results?
- Would transfer learning work still if I have severely imbalanced classes in my dataset and the class that has minimum number of data points meets the requirement in #1?
- What is the ratio of datapoints in train val? is it 80/20 or 60/40?
I’m not sure if one can claim a number, as I think it depends on your use case. In other words, if your data distribution is quite simple, e.g. black images vs. white images, you will cover this distribution with just a few samples. On the other hand, a more complex use case might need more data. You could try to create “new sample” using data augmentation, but if you just have a few samples, it’s kind of hard to guess how your system will perform in the real world.
In my opinion, training an imbalanced dataset is not necessarily linked to transfer learning. You might bump into the same challenges no matter whether you train from scratch or fine tune a model.
For “small” datasets, you could try to use 15-25% of your dataset for validation. If you have a massive dataset (which is apparently not the case here), you could go down to just use 1% of the data.
These are just my two cents, so let’s wait for other opinions.