Hello everyone, I know what you are thinking: “yet again another question on a neural network size and hyperparameters”. My question is specific to a given scenario:
- I have a small dataset
- I am classifying images into two classes
- I am afraid of overfitting
- I am afraid of underfitting
- My neural network reaches 90% accuracy on the validation dataset after one epoch
- Adding or removing one or two layers out of 6 does not seem to change performance
Does this mean that I am creating a model bigger than needed?
I tried to create a smaller architecture. This smaller architecture reached the same accuracy as a bigger one on the validation set. However, the bigger network (sometimes - depending on the training) seems to work better in deployment (no labeled data - just mere human observations). This is not always true: sometimes retraining the bigger network from scratch leads to poorer results. Also, after training for a few epochs the training and validation losses drop and got stuck at some terrible local minima.
My conclusions are that:
- My validation dataset does not represent a good sample of the real case scenarios
- I need to decrease the learning rate
However, I am not sure anymore if I am underfitting or overfitting. Based on your experience:
Is there still anything I can do, besides collecting more data, to understand if I am underfitting or overfitting the data? I am not sure whether I should decrease the number of layers, or decreasing it.
This is a simple classification problem that takes in input grayscale images of 200x200 pixels. The goal is to detect images with certain blobs of white color (of different shapes) while discarding others that may contain some noise. I have only a few hundreds of examples. Do you think using 6 layers and about 30k parameters overall could be overkilling? I am mainly using 3x3 convolutions followed by 1x1 to reduce parameters. The challenge is the lack of data.