- What is the minimum number of images per folder to get meaningful results?
- Would transfer learning work still if I have severely imbalanced classes in my dataset and the class that has minimum number of data points meets the requirement in #1?
- What is the ratio of datapoints in train val? is it 80/20 or 60/40?
-
I’m not sure if one can claim a number, as I think it depends on your use case. In other words, if your data distribution is quite simple, e.g. black images vs. white images, you will cover this distribution with just a few samples. On the other hand, a more complex use case might need more data. You could try to create “new sample” using data augmentation, but if you just have a few samples, it’s kind of hard to guess how your system will perform in the real world.
-
In my opinion, training an imbalanced dataset is not necessarily linked to transfer learning. You might bump into the same challenges no matter whether you train from scratch or fine tune a model.
-
For “small” datasets, you could try to use 15-25% of your dataset for validation. If you have a massive dataset (which is apparently not the case here), you could go down to just use 1% of the data.
These are just my two cents, so let’s wait for other opinions.