Hi,
so I am training a model and the performance varies considerably when I use a different seed. I assume its is due to the weight initiation, so I was wondering what I could do to combat that issue.
I did a hyper-parameter optimization and that is why I feel the learning rate is appropriate.
Maybe there are ways to change the weight initialization, below is the way I do it at the moment.
The weight and weight-skip matrices are of same size.
If you need more information I am happy to provide them
def reset_parameters(self):
stdv = 1. / math.sqrt(self.weight.size(1))
self.weight.data.uniform_(0, stdv)
stdv = 1. / math.sqrt(self.weightSkip.size(1))
self.weightSkip.data.uniform_(0, stdv)
if self.bias is not None:
self.bias.data.uniform_(-stdv, stdv)