Performance of model depends greatly on intial seed


so I am training a model and the performance varies considerably when I use a different seed. I assume its is due to the weight initiation, so I was wondering what I could do to combat that issue.
I did a hyper-parameter optimization and that is why I feel the learning rate is appropriate.

Maybe there are ways to change the weight initialization, below is the way I do it at the moment.
The weight and weight-skip matrices are of same size.
If you need more information I am happy to provide them

def reset_parameters(self):
    stdv = 1. / math.sqrt(self.weight.size(1)), stdv)
    stdv = 1. / math.sqrt(self.weightSkip.size(1)), stdv)
    if self.bias is not None:, stdv)
  1. What is the different performance ranges, with and without initializations?
  2. What type of models are you using? Generative models, say GAN’s, are very sensitive to initializations.
  3. Empirically, I have observed that the using Batch Norm Layers tend to ease the dependency on a good initialization.
  4. Xavier’s initialization can be used to prevent the exploding and vanishing gradients problem.


  1. I am not sure what is meant without initialization. But for example with one seed I get an AUC (BCE Loss of 0.48) of 0.88 and with a different one an AUC (BCE Loss of 0.41) of 0.91. So the loss is actually more concerning to me than the AUC.

  2. It a Graph Convolution Network

  3. That could be indeed a issue because I removed for some layers the batch-norm as it provided better results.

  4. I will have a look at the Xavier init

Thanks for the reply