Model Performance Dependent on PyTorch Weights Initialisation

CCL · March 31, 2020, 7:06am

I have a trained CNN with 70% accuracy, I apply transfer learning, add Global Average Pooling to it, and finetune this new model. However, the train & validation accuracy differ wildly depending on PyTorch’s random weight initialisation values (I’ve seen from 10% to 50% so far), even across different learning rates (1e-01 to 1e-06). The train & valid loss are simply stuck at around their initial value, even after 400 epochs of backpropagation.

I have made sure to set the model’s parameters with requires_grad() = True. What could be a possible reason for this bizarre behaviour? Thank you!

ptrblck · March 31, 2020, 7:42am

I wouldn’t call it a bizarre behavior and would expect that the loss might get stuck using a wrong initialization method.
In my opinion, sane weight init functions were a key to be able to train deep neural networks at all.

CCL · March 31, 2020, 7:55am

I’m using the default weights initialisation from PyTorch, which worked for the CNN (but not after I added GAP). Would it be a good idea to experiment with different weights initialisation methods? If so, which do you recommend? Thanks!