I would like to test the impact of randomly resetting (re-initializing) certain neurons during training, in a Pytorch-based model with a number of linear layers like this:
class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.linear1 = nn.Linear(D, M) self.linear2 = nn.Linear(M, M) self.linear3 = nn.Linear(M, M) self.linear4 = nn.Linear(M, K) def forward(self, x): x = F.relu(self.linear1(x)) x = F.relu(self.linear2(x)) x = F.relu(self.linear3(x)) x = self.linear4(x) x = F.log_softmax(x, dim=1) return x
Ideally, I would like to implement a function/code snippet that resets the weights of 1 or more random neurons in each layer at the start of each epoch. The idea is that each neuron has a small probability of being re-initialized each epoch, to see what the effects are on performance in general, as well as on common issues such as dying relu’s and vanishing gradients for sigm and tanh.
In other words: instead of a single epoch dropout, you’d have a permanent replacement of an existing neuron with a ‘fresh’ new trainable neuron.
I tried implementing this via calls to parameters or state_dict, but so far I only get permission and anomaly errors when trying this. I’m also not sure whether this should be implemented in the module itself or in the training part.
Any suggestions would be highly welcome!