Initialize nn.Linear with specific weights

Diego999 · November 7, 2018, 12:34pm

Hi everyone,

Basically, I have a matrix computed from another program that I would like to use in my network, and update these weights.

In [1]: import torch
In [2]: import torch.nn as nn
In [4]: linear_trans = nn.Linear(3,2)
In [5]: my_weights = torch.tensor([[1,2],[3,4],[5,6]])
In [6]: linear_trans.weight
Out[6]: 
Parameter containing:
tensor([[ 0.4409, -0.2647,  0.3328],
        [ 0.1049, -0.1887, -0.2617]], requires_grad=True)

In [7]: linear_trans.weight.data = my_weights
In [8]: linear_trans.weight
Out[8]: 
Parameter containing:
tensor([[1, 2],
        [3, 4],
        [5, 6]], requires_grad=True)

Is this enough for :

Use these weights as parameters for nn.Linear (without changing the values of the bias vector)
Update these with .backwards automatically

In addition:

if I would like to move them to GPU, is it enough to do model.to(device) ? (if my model uses the nn.Linear with the weights loaded)
If I want to freeze the weights, is using linear_trans.weight.data.requires_grad = False enough ?

Thank you very much for your help, I just want to be sure as I’ve never done this kind of weight initialization

albanD · November 7, 2018, 12:46pm

You should not use .data anymore but use the with torch.no_grad(): context manager with the most recent versions of pytorch.
See how the nn.init module work for example here.

And yes to all your questions otherwise, it will work exactly that way.
Note that setting requires_grad = False will make it so that no gradients are computed (or kept at 0). This does not necerraly mean that the weights won’t be updated as for example, Adam will change the weights even for a gradient of 0 because of the momentum terms.

Diego999 · November 7, 2018, 12:59pm

Thank you for your answers and your note.

So if I understand correctly, this should be the way:

class MyModule(nn.Module):

    def __init__(self, weights):
        super(MyModule, self).__init__()

        self.linear = nn.Linear(weights.shape[1], weights.shape[0])
        with torch.no_grad():
            self.linear.weight = torch.tensor(weights) # nn.Parameter(...) ?

Is this enough to update the weights with gradients later or should I enforce with nn.Parameter ?

albanD · November 7, 2018, 1:01pm

You can change the original tensor inplace to avoid such problems with which type you should create by doing:

with torch.no_grad():
    self.linear.weight.copy_(your_new_weights)

Diego999 · November 7, 2018, 1:06pm

Ok great, thank you very much !

Andy_Zhao · August 4, 2020, 4:20pm

Hi,

            with torch.no_grad():
                w = torch.Tensor(weights).reshape(self.weight.shape)
                self.weight.copy_(w)

I have tried the code above, the weights are properly assigned to new values.
However, the weights just won’t update after loss.backward() if I manually assign them to new values. The weights become the fixed value that I assigned. (The weights are updated correctly if not manually assigned)

Could you please help me with this problem？
Your help is highly appreciated!

ptrblck · August 7, 2020, 5:19am

Could you post a code snippet to reproduce this issue, as the posed code example seems to work:

class MyModule(nn.Module):
    def __init__(self, weights):
        super(MyModule, self).__init__()

        self.linear = nn.Linear(weights.shape[1], weights.shape[0])
        with torch.no_grad():
            self.linear.weight.copy_(weights)
        
    def forward(self, x):
        x = self.linear(x)
        return x

weights = torch.randn(10, 10)
model = MyModule(weights)
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)

out = model(torch.randn(1, 10))
out.mean().backward()

for name, param in model.named_parameters():
    print(name, param.grad)

w0 = model.linear.weight.clone()
optimizer.step()
w1 = model.linear.weight.clone()
print(w1 - w0)