Requires_grad propagation

Luigi · August 23, 2020, 11:46am

Hi guys, as mentioned in this post I have some tensor operations and a vector among them contains the parameters I want to optimise.
That said, I’m having troubles in propagating the requires_grad through these simple operations.
Every time the forward is called v is computed as:

v = self.sym @ self.v_short + self.fixed

where self.v_short is a torch.nn.Parameter() vector and it’s exactly what I want to train, while self.sym and self.fixed are torch.tensor(), respectively a constant matrix and a constant vector (hence, no gradient and training needed). As I was expecting, self.sym and self.fixed have requires_grad=False, while self.v_short has requires_grad=True.
However, for some reasons that I cannot figure out, v has requires_grad=False.

In the trying of figuring out what the problem was, I came across multiple posts -like this one- where was said that:

if a leaf node requires_grad , all subsequent nodes computed from it will automatically also require_grad

which makes sense! And I’m wondering if you could explain why this is not happening for me?
Thx in advance for helping!

JuanFMontesinos · August 23, 2020, 4:08pm

Because a nn.Parameters is not a leaf node. The other ones are constants (buffers inside a nn.Module). In fact you don’t reall need to use a nn.Module. You can optimize a tensor directly.

import torch

cte1 = torch.rand(5, 3).requires_grad_(False)
cte2 = torch.rand(5, 5).requires_grad_(False)
tensor = torch.ones(3, 5).requires_grad_()

optim = torch.optim.SGD([tensor], lr=1)

print(f'Initial tensor'
      f'{tensor}')
for i in range(5):
    optim.zero_grad()
    print(f'Iteration {i}')
    output = cte1 @ tensor + cte2
    print(f'Requires grad? {output.requires_grad}')
    output.sum().backward()
    print(f'Tensor gradients \n'
              f' {tensor.grad}')
    optim.step()
    print(f'Tensors \n'
              f' {tensor}')

Initial tensortensor([[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]], requires_grad=True)
Iteration 0
Requires grad? True
Tensor gradients 
 tensor([[1.4718, 1.4718, 1.4718, 1.4718, 1.4718],
        [1.7690, 1.7690, 1.7690, 1.7690, 1.7690],
        [2.1010, 2.1010, 2.1010, 2.1010, 2.1010]])
Tensors 
 tensor([[-0.4718, -0.4718, -0.4718, -0.4718, -0.4718],
        [-0.7690, -0.7690, -0.7690, -0.7690, -0.7690],
        [-1.1010, -1.1010, -1.1010, -1.1010, -1.1010]], requires_grad=True)
Iteration 1
Requires grad? True
Tensor gradients 
 tensor([[1.4718, 1.4718, 1.4718, 1.4718, 1.4718],
        [1.7690, 1.7690, 1.7690, 1.7690, 1.7690],
        [2.1010, 2.1010, 2.1010, 2.1010, 2.1010]])
Tensors 
 tensor([[-1.9435, -1.9435, -1.9435, -1.9435, -1.9435],
        [-2.5379, -2.5379, -2.5379, -2.5379, -2.5379],
        [-3.2019, -3.2019, -3.2019, -3.2019, -3.2019]], requires_grad=True)
Iteration 2
Requires grad? True
Tensor gradients 
 tensor([[1.4718, 1.4718, 1.4718, 1.4718, 1.4718],
        [1.7690, 1.7690, 1.7690, 1.7690, 1.7690],
        [2.1010, 2.1010, 2.1010, 2.1010, 2.1010]])
Tensors 
 tensor([[-3.4153, -3.4153, -3.4153, -3.4153, -3.4153],
        [-4.3069, -4.3069, -4.3069, -4.3069, -4.3069],
        [-5.3029, -5.3029, -5.3029, -5.3029, -5.3029]], requires_grad=True)
Iteration 3
Requires grad? True
Tensor gradients 
 tensor([[1.4718, 1.4718, 1.4718, 1.4718, 1.4718],
        [1.7690, 1.7690, 1.7690, 1.7690, 1.7690],
        [2.1010, 2.1010, 2.1010, 2.1010, 2.1010]])
Tensors 
 tensor([[-4.8871, -4.8871, -4.8871, -4.8871, -4.8871],
        [-6.0759, -6.0759, -6.0759, -6.0759, -6.0759],
        [-7.4039, -7.4039, -7.4039, -7.4039, -7.4039]], requires_grad=True)
Iteration 4
Requires grad? True
Tensor gradients 
 tensor([[1.4718, 1.4718, 1.4718, 1.4718, 1.4718],
        [1.7690, 1.7690, 1.7690, 1.7690, 1.7690],
        [2.1010, 2.1010, 2.1010, 2.1010, 2.1010]])
Tensors 
 tensor([[-6.3588, -6.3588, -6.3588, -6.3588, -6.3588],
        [-7.8448, -7.8448, -7.8448, -7.8448, -7.8448],
        [-9.5048, -9.5048, -9.5048, -9.5048, -9.5048]], requires_grad=True)

Process finished with exit code 0

Luigi · August 24, 2020, 11:15am

Hey Juan, thx for replying!
You gave me a very good insight, I just checked whether self.v_short was a leaf node and it is!

However, requires_grad for v is still False.

Unfortunately it has to be a nn.Module because it’s part of a bigger framework that I’m modifying.

So, I’m stuck again and I don’t know what is wrong!