How to re-initialize individual neurons during training

Joost_van_der_Burgt · May 25, 2021, 6:10pm

I would like to test the impact of randomly resetting (re-initializing) certain neurons during training, in a Pytorch-based model with a number of linear layers like this:

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.linear1 = nn.Linear(D, M)
        self.linear2 = nn.Linear(M, M)
        self.linear3 = nn.Linear(M, M)
        self.linear4 = nn.Linear(M, K)

    def forward(self, x):
        x = F.relu(self.linear1(x))
        x = F.relu(self.linear2(x))
        x = F.relu(self.linear3(x))
        x = self.linear4(x)
        x = F.log_softmax(x, dim=1)
               
        return x

Ideally, I would like to implement a function/code snippet that resets the weights of 1 or more random neurons in each layer at the start of each epoch. The idea is that each neuron has a small probability of being re-initialized each epoch, to see what the effects are on performance in general, as well as on common issues such as dying relu’s and vanishing gradients for sigm and tanh.

In other words: instead of a single epoch dropout, you’d have a permanent replacement of an existing neuron with a ‘fresh’ new trainable neuron.

I tried implementing this via calls to parameters or state_dict, but so far I only get permission and anomaly errors when trying this. I’m also not sure whether this should be implemented in the module itself or in the training part.

Any suggestions would be highly welcome!

pascal_notsawo · May 25, 2021, 6:33pm

There are several ways to initialize the parameters of a model in deep learning : torch.nn.init — PyTorch 1.8.1 documentation

So based on the code you proposed above, I suggest something like :

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.linears = nn.ModuleList([
	        nn.Linear(D, M)
	        nn.Linear(M, M)
	        nn.Linear(M, M)
	        nn.Linear(M, K)
        ])  

        # ...
    # forward
    # ...

def init_linear(linear):
    with torch.no_grad():
        # Choose your approach here, I just suggest some
        nn.init.normal_(linear.weight, mean=0, std=1) 
        #nn.init.xavier_uniform_(linear.weight)
        nn.init.constant_(linear.bias, 0.)

# Or when you want to initialize the whole model
def init_model(model):
    with torch.no_grad():
        for name, param in model.named_parameters():
            # Choose your approach here, I just suggest some
            if 'weight' in name:
                torch.nn.init.xavier_uniform_(param.data)
            elif 'bias' in name:
                param.data.fill_(0)
            # elif :
            # ...

And during training:

for _ in range(MAX_EPOCH) :
	to_init = you choose randomly the indices of the layers you want to reinitialize
	for i in to_init :
		init_linear(model.linears[i])
	# ...
	# training