Initalizing weights and biases to a specific vector in Python

gcharrispink · October 7, 2022, 2:20am

Hi all.

Recently in my work we’ve wanted to experiment by setting our neural network to be initalized very close to the optimal solution to see if it would start to divert from it. However, in doing so I encountered a problem with initalizing our weights and biases to be a specific vector of values instead of something from a Pytorch distribution or say, .zero or .ones. I’ve posted part of my code below, and what I’d like to be able to do is replace the lines updating fc1’s bias and weights with some initial values I prescribe while leaving the rest of the code the same. Upon looking into some documentation I wasn’t able to find a way to essentially replace values in an array via Pytorch, but somewhere along the way of setting the bias with something like uniform this has to be done, so this should be possible.

Thanks in advanced and if anything isn’t clear, I’ll do my best to clarify as soon as possible.

class MyModel(torch.nn.Module):
def init(self,N = train_args.numofpoints-2,lr = train_args.lr):
super(MyModel, self).init()
self.N = N
self.fc1 = nn.Linear(1,int(N/4)-1) #layer one
nn.init.uniform_(self.fc1.bias,0,1)
self.fc1.weight.data.uniform_(-1,1)

    self.fc2 = nn.Linear(int(N/4)-1,N-int(N/4))#layer two
    #self.fc2.weight.data.uniform_(-0.5,0.5)
    self.fc2.bias.data.fill_(0)
    self.fc2.weight.data.fill_(0)
    self.fc3 = nn.Linear(N-int(N/4), N) #final hidden layer
    self.optimizer = optim.Adam(self.parameters(), lr=lr)

ptrblck · October 7, 2022, 2:56am

You should be able to use .copy_:

lin = nn.Linear(10, 10)
weight = torch.ones(10, 10)
bias = torch.ones(10) * 2

with torch.no_grad():
    lin.weight.copy_(weight)
    lin.bias.copy_(bias)

gcharrispink · October 7, 2022, 3:09am

This does make sense however how would we set this up for any arbitrary weight or bias? Say I have N =16 in my network so we need 3 weights and bias for the first layer, and lets say I wanted those initial weights to be [-0.95, 3, 11.1] (These numbers are just arbitrary ones I choose not specific ones I want to use), how would I use this in order to pass those as my weights? I presume just by replacing the .ones with a tensor with those values but I’m not entirely sure on how .copy handles dimensionality?

ptrblck · October 7, 2022, 3:11am

Yes, you can replace the weight and bias tensor with your desired values:

lin = nn.Linear(3, 1)
weight = torch.tensor([[-0.95, 3, 11.1]])
bias = torch.tensor([22.])

with torch.no_grad():
    lin.weight.copy_(weight)
    lin.bias.copy_(bias)

You also need to make sure the shape matches:

lin = nn.Linear(3, 1)
weight = torch.tensor([[-0.95, 3, 11.1, 33.]])
bias = torch.tensor([22.])

with torch.no_grad():
    lin.weight.copy_(weight)
    lin.bias.copy_(bias)

# RuntimeError: The size of tensor a (3) must match the size of tensor b (4) at non-singleton dimension 1

gcharrispink · November 7, 2022, 11:07pm

Not sure if I should write here as a followup to this problem, but something else has come up doing other tests that makes me wonder about alternative ways to approach this problem. Specifically, if we have a network with say 3 layers, and we initalize our network the above way for all 3 layers including this torch.no_grad(), we get an issue of a network that doesn’t learn. My understanding is because this means we cannot possibly take any gradients of any layers meaning the network cannot modify the weights and biases if we use a gradient based optimizer. This leads me to ask, is there a way to do this same initalization except make it so we can use a gradient based operator? Our purpose for this is to start our network near a very optimal solution to see how it performs with a strong starting point instead of a random one.

Thanks in advanced, if this should be made a seperate thread let me know and I will move it to another thread.

ptrblck · November 7, 2022, 11:36pm

That should not be the case and the parameters should still be trainable.
The no_grad() guard is needed to allow the inplace manipulation of the parameters since the value assignment itself is not differentiable.

Here is a small example showing the parameters are still updated:

lin = nn.Linear(3, 1)
optimizer = torch.optim.Adam(lin.parameters(), lr=1.)

weight = torch.tensor([[-0.95, 3, 11.1]])
bias = torch.tensor([22.])

with torch.no_grad():
    lin.weight.copy_(weight)
    lin.bias.copy_(bias)

print(lin.weight)
# Parameter containing:
# tensor([[-0.9500,  3.0000, 11.1000]], requires_grad=True)

out = lin(torch.randn(1, 3))
out.mean().backward()
print(lin.weight.grad)
# tensor([[-0.4528, -0.2959, -0.3983]])

optimizer.step()
print(lin.weight)
# Parameter containing:
# tensor([[ 0.0500,  4.0000, 12.1000]], requires_grad=True)

gcharrispink · November 7, 2022, 11:43pm

Thanks for your response. I think I’ve definitely misunderstood the no_grad() guard. We’ve been having issues where our network doesn’t seem to be learning (Ie if we plot the loss function its just oscilating between different values at the same order of accuracy) and I think the fact that with the initalization I’ve described above, our network not changing its weights and biases at all is now indiciating there’s something else going wrong I need to figure out. Thanks again for all your clarifications

ptrblck · November 7, 2022, 11:52pm

Could you check if all your used parameters are getting a valid gradient e.g. via:

for name, param in model.named_parameters():
    print(name, param.grad)

If that’s not the case, you might have indeed another error in your training code.