Here is a short code snippet.
In [1]: import torch
In [2]: t1 = torch.rand(2,4)
In [3]: t1.requires_grad
Out[3]: False
In [4]: t1.requires_grad = True
In [5]: t1.requires_grad
Out[5]: True
In [6]: l = torch.nn.Linear(4,2)
In [7]: l.weight.data.requires_grad
Out[7]: False
In [8]: l.weight.data.requires_grad = True
In [9]: l.weight.data.requires_grad
Out[9]: False
In [10]: l.weight.requires_grad
Out[10]: True
In [11]: print(type(t1), type(l.weight), type(l.weight.data))
<class 'torch.Tensor'> <class 'torch.nn.parameter.Parameter'> <class 'torch.Tensor'>
I am confused about some output of the code in terms of these things:
- What’s the difference between
torch.Tensor.requires_grad
(t1
) andtorch.nn.parameter.Parameter
(l
)? E.g., How do they influence behaviors in calculating the gradients? - Why setting
t1.requires_grad
toTrue
works but settingl.weight.data.requires_grad
toTrue
does not, considering thatt1
andl.weight.data
are both an object of classtorch.Tensor
? - What the relationship between
l.weight.data.requires_grad
andl.weight.requires_grad
? The two seems not synchronized.
I read related documents, source codes and forum discussions, but I found it hard to form a comprehensive understandings in mind to perfectly explain the questions mentioned above.
I really appreciate it if someone helps.