Performance of model depends greatly on intial seed

JanoschMenke · March 26, 2020, 8:50am

Hi,

so I am training a model and the performance varies considerably when I use a different seed. I assume its is due to the weight initiation, so I was wondering what I could do to combat that issue.
I did a hyper-parameter optimization and that is why I feel the learning rate is appropriate.

Maybe there are ways to change the weight initialization, below is the way I do it at the moment.
The weight and weight-skip matrices are of same size.
If you need more information I am happy to provide them

def reset_parameters(self):
    stdv = 1. / math.sqrt(self.weight.size(1))
    self.weight.data.uniform_(0, stdv)
    
    stdv = 1. / math.sqrt(self.weightSkip.size(1))
    self.weightSkip.data.uniform_(0, stdv)
    
    if self.bias is not None:
        self.bias.data.uniform_(-stdv, stdv)

charan_Vjy · March 26, 2020, 11:09am

What is the different performance ranges, with and without initializations?
What type of models are you using? Generative models, say GAN’s, are very sensitive to initializations.
Empirically, I have observed that the using Batch Norm Layers tend to ease the dependency on a good initialization.
Xavier’s initialization can be used to prevent the exploding and vanishing gradients problem.

JanoschMenke · March 26, 2020, 1:05pm

Hi,

I am not sure what is meant without initialization. But for example with one seed I get an AUC (BCE Loss of 0.48) of 0.88 and with a different one an AUC (BCE Loss of 0.41) of 0.91. So the loss is actually more concerning to me than the AUC.
It a Graph Convolution Network
That could be indeed a issue because I removed for some layers the batch-norm as it provided better results.
I will have a look at the Xavier init

Thanks for the reply