Query: Confusion regarding requires_grad default state when defining a layer

Hi,
New here. This may be silly but I am unsure about the default value of requires_grad for parameters of a nn object.

For example:

linear = nn.Linear(10,10)
linear.train()
for params in linear.parameters():
    print('requires_grad of parameters -',params.data.requires_grad)
    print('requires_grad of output of layer - ',linear(torch.tensor([0.]*10)).requires_grad)
    break

Output:

requires_grad of parameters - False
requires_grad of output of layer -  True

Why do the parameters of a layer in train mode have requires_grad as false? Why does the output of the same layer have requires_grad true? I am confused. Just started pytorch so I am unsure if I am making some instantiation mistake.

Also, nothing changes when the layer is being evaluated:

linear = nn.Linear(10,10)
linear.eval()
for params in linear.parameters():
    print('requires_grad of parameters -',params.data.requires_grad)
    print('requires_grad of output of layer - ',linear(torch.tensor([0.]*10)).requires_grad)
    break

Output:

requires_grad of parameters - False
requires_grad of output of layer -  True

Thanks

Hi Rao!

Get rid of .data and try params.requires_grad.

.data has been deprecated for quite some time and can break things
so you shouldn’t be using it.

(.data, in effect, digs, the raw – without requires_grad = True – tensor
out of its wrapper tensor that carries requires_grad = True.)

Best.

K. Frank

1 Like

Hi Frank,

Thanks for the suggestion. It all makes sense now!

Also, if I wanted to manually set the values of the weights, is .data still not the way to go? I have been doing something like this in my code to manually update the weights.

for param in DNN1.parameters():
    param.data = theta_t[start:end].view(param.shape)

theta_t is just a tensor I generate from another process.

Rao

Hi Rao!

Again, .data is deprecated. It could, potentially, be removed from pytorch.
There is also no guarantee that it will do what you want.

The approved idiom for this would be to use a no_grad(): block and an
inplace copy_():

with torch.no_grad():
    for param in DNN1.parameters():
        param.copy_ (theta_t[start:end].view (param.shape))

Best.

K. Frank

1 Like