I’m facing a certain problem and I can’t figure out why.
I am trying to train a binary classification model, which seems to work with high accuracy most of the time through multiple re-trainings (about 90% on the validation set). However, in some runs, the model does not give better accuracy than a random estimate (50%). I made sure that all runs use the same data and even fixed the seed for the data and batch sampling. The only difference between the runs is the random network initialization (same hyperparameters, same everything else).
I can’t figure out what the problem is, especially since I haven’t observed any difference in performance between the good and random ones (either ~90% or 50%).
Can network initialization have such an impact? Also, I don’t think it’s the size of the dataset, as I think I have more than enough data for this problem (it could still be the problem, but I think it’s unlikely).
Thanks for your help.
Yes, the model initialization could have a significant effect on the success rate of the model training.
I don’t know how long it would take to determine if the training gets stuck or if you are seeing the expected good accuracy, but in case it’s fast you could use different seeds and check the success rates or different initializations in order to get a better picture of the training stability.
Hello, thanks for your reply.
Yes, that’s what I did, I tried different seeds, and depending on the seed the model either train and give a good accuracy on validation or not train at all.
Sorry, I want to add some more information I should have put in the first post. When it doesn’t work, it doesn’t train at all, always predicting the same class and even the train loss isn’t decreasing at all (Hence, 0.5% accuracy).
Since my first post, I tried different hyperparameters (number of neurons and layers) and for each of these there is a different set of seeds where it doesn’t train. (for example, with the first hyperparameters’ configuration, the model train well and give the expected good performances with seed=2 and doesn’t train at all with seed=3. With the second hyperparameters’ configuration, the model doesn’t train with seed=2, but it does with seed=3)