I have a regression model that seems to be highly sensitive to its weight initialization - such that sometimes the model converges while other times it looks like its going to converge and then explodes to nan. My input features are scaled to [-1,1] while outputs are standardized. I am utilizing a tanh activation function and a kaiming_uniform weight initialization scheme. How do I get it such that I don’t need to worry that the model won’t train and require me to retrain. I should also note that I am utilizing a LBFGS optimizer with a lr of 0.8 and hope to maintain the same level of computational performance. Any thoughts?
kaiming_uniform_(self.weight,nonlinearity='tanh) if self.bias is not None: fan_in,_=init._calculate_fan_in_and_fan_out(self.weight) bound=1/np.sqrt(fan_in) init.uniform_(self.bias,-bound,bound)