Performance of model depends greatly on intial seed

Hi,

so I am training a model and the performance varies considerably when I use a different seed. I assume its is due to the weight initiation, so I was wondering what I could do to combat that issue.
I did a hyper-parameter optimization and that is why I feel the learning rate is appropriate.

Maybe there are ways to change the weight initialization, below is the way I do it at the moment.
The weight and weight-skip matrices are of same size.
If you need more information I am happy to provide them

def reset_parameters(self):
    stdv = 1. / math.sqrt(self.weight.size(1))
    self.weight.data.uniform_(0, stdv)
    
    stdv = 1. / math.sqrt(self.weightSkip.size(1))
    self.weightSkip.data.uniform_(0, stdv)
    
    if self.bias is not None:
        self.bias.data.uniform_(-stdv, stdv)
1 Like
  1. What is the different performance ranges, with and without initializations?
  2. What type of models are you using? Generative models, say GAN’s, are very sensitive to initializations.
  3. Empirically, I have observed that the using Batch Norm Layers tend to ease the dependency on a good initialization.
  4. Xavier’s initialization can be used to prevent the exploding and vanishing gradients problem.

Hi,

  1. I am not sure what is meant without initialization. But for example with one seed I get an AUC (BCE Loss of 0.48) of 0.88 and with a different one an AUC (BCE Loss of 0.41) of 0.91. So the loss is actually more concerning to me than the AUC.

  2. It a Graph Convolution Network

  3. That could be indeed a issue because I removed for some layers the batch-norm as it provided better results.

  4. I will have a look at the Xavier init

Thanks for the reply