Layer weight vs. weight.data

I’m new to PyTorch. Suppose you want to set layer weights to specific values. Is it better to set mynet.layer.weight[I][j] or set mynet.layer.weight.data[I][j] or does it matter?

For example, mynet.layer.weight[0][0] = 9.9999 and mynet.layer.weight.data[0][0] = 9.9999 both seem to have the same effect. The weight is a Parameter object and weight.data is a Tensor object but I don’t know what the implications are.

This is just investigation on my part. Thanks to some previous feedback from user ‘ptrblck’ I understand the Torch.nn.init functions for practical use.

1 Like

Manipulating the weights directly would most likely give you an error when you try to call backward:

lin = nn.Linear(10, 2)
lin.weight[0][0] = 1.
x = torch.randn(1, 10)
output = lin(x)
output.mean().backward()
> RuntimeError: leaf variable has been moved into the graph interior

Using .data on the other side would work, but is generally not recommended, as changing it after the model was used would yield weird results and autograd cannot throw an error.
I would recommend to use with torch.no_grad():

lin = nn.Linear(10, 2)
with torch.no_grad():
    lin.weight[0][0] = 1.

x = torch.randn(1, 10)
output = lin(x)
output.mean().backward()
9 Likes

Ah! That makes perfect sense. I noticed that manipulating the weights changed the ‘requires_grad=True’ to ‘grad_fn=(CopySlices)’. Thank you! JM

it works!!!!!!!!
Thank you very much:grin:

1 Like

Thanks for the answer ptrblck!!

I am wondering why this is?

" but is generally not recommended, as changing it after the model was used would yield weird results and autograd cannot throw an error."

Is there any other reasons for this besides the autograd thing?

I think because an arbitrary manipulation is not a valid operation, which would yield valid gradients.
Have a look at this dummy example:

x = torch.randn(1)
w = torch.randn(1, requires_grad=True)

y = x*w
#w[0] = 1.
y.backward()
print(w.grad)

If you uncomment the particular line, you’ll get an error.

Note that manipulating the .data might yield a wrong result in your calculations, which I would consider a strong argument against using this approach.

thanks it worked for me!