Difference between torch.Tensor.requires_grad and torch.nn.parameter.Parameter

Here is a short code snippet.

In [1]: import torch                                                                                                                                                                                                                        
In [2]: t1 = torch.rand(2,4)                                                                                                                                                                                                                
In [3]: t1.requires_grad                                                                                                                                                                                                                    
Out[3]: False                                                                                                                                                                                                                               
In [4]: t1.requires_grad = True                                                                                                                                                                                                             
In [5]: t1.requires_grad                                                                                                                                                                                                                    
Out[5]: True                                                                                                                                                                                                                                
In [6]: l = torch.nn.Linear(4,2)                           
In [7]: l.weight.data.requires_grad                        
Out[7]: False                                              
In [8]: l.weight.data.requires_grad = True                 

In [9]: l.weight.data.requires_grad                                                                                   
Out[9]: False                                              
In [10]: l.weight.requires_grad                            
Out[10]: True                                              

In [11]: print(type(t1), type(l.weight), type(l.weight.data))                                                         
<class 'torch.Tensor'> <class 'torch.nn.parameter.Parameter'> <class 'torch.Tensor'>

I am confused about some output of the code in terms of these things:

  1. What’s the difference between torch.Tensor.requires_grad(t1) and torch.nn.parameter.Parameter(l)? E.g., How do they influence behaviors in calculating the gradients?
  2. Why setting t1.requires_grad to True works but setting l.weight.data.requires_grad to True does not, considering that t1 and l.weight.data are both an object of class torch.Tensor?
  3. What the relationship between l.weight.data.requires_grad and l.weight.requires_grad? The two seems not synchronized.

I read related documents, source codes and forum discussions, but I found it hard to form a comprehensive understandings in mind to perfectly explain the questions mentioned above.

I really appreciate it if someone helps.

  1. nn.Parameter just wraps a tensor and sets its .requires_grad attribute to True as seen here.

  2. Don’t use or depend on the internal .data attribute as it’s user-facing usage is deprecated and it’s only used internally.

  3. Same as 2.

1 Like

Thank you @ptrblck and with your help I’m not confused anymore.

1 Like