Grad is None when doing loss.backward

duskybomb · June 8, 2020, 4:17pm

I am trying to calculate the gradient (d(loss)/dj). But I get grad is None

class model(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Linear(256, 2)
        
    def forward(self, j, labels):
        e = self.fc(j)
        print(labels.shape)
        print(e.shape)

        j.requires_grad = True
        
        with torch.enable_grad():
            loss = F.cross_entropy(e, labels, reduction='sum')
            j.retain_grad()
            loss.backward()
            grad = j.grad.detach()
            
        return grad

albanD · June 8, 2020, 4:30pm

Hi,

The problem I think is that you set j.requires_grad after you used it. So the graph is not created when you do the forward.
You should move this at the beginning of the forward() function.

fotinidelig · September 26, 2021, 7:21am

Hello!

I’m experiencing the same problem but due to something different probably.
This is my code, I’m declaring requires_grad_(True) and then doing loss.backward() but in the end None is printed which is very odd… I’m using the newest version of torch/torchvision 1.9.1 and 0.10.1 respectively. If anyone has a clue for what might be happening let me know!

    net.eval()

    adv_x = x.clone().detach().float().requires_grad_(True)

    # start from a random point near x
    rand = torch.zeros_like(x).uniform_(-eps, eps).float()
    adv_x = adv_x + rand
    adv_x = torch.clamp(adv_x, x_min, x_max)

    if not targeted:
        target = torch.argmax(net.forward(x))

    criterion = nn.CrossEntropyLoss()
    for _ in range(n_iters):
        pred = net.forward(adv_x)
        loss = criterion(pred, target)
        if targeted:
            loss = -loss
        loss.backward()
        print(adv_x.grad)

ptrblck · September 26, 2021, 7:58pm

You are replacing the original adv_x leaf tensor:

x = torch.randn(1)
adv_x = x.clone().detach().float().requires_grad_(True)
print(adv_x.requires_grad)
> True
print(adv_x.is_leaf)
> True

here:

eps = 1e-6
rand = torch.zeros_like(x).uniform_(-eps, eps).float()
adv_x = adv_x + rand
x_min, x_max = -1, 1
adv_x = torch.clamp(adv_x, x_min, x_max)
print(adv_x.requires_grad)
> True
print(adv_x.is_leaf)
> False

If you are trying to access the .grad attribute of adv_x, you will also get a warning which explains the returned None value:

y = adv_x * 2
y.backward()
print(adv_x.grad)
> None
UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information.

fotinidelig · September 27, 2021, 10:35am

Thank you, I solved it by doing

adv_x = adv_x.clone().detach().float().requires_grad_(True)

in each iteration!

ptrblck · September 27, 2021, 11:05pm

I don’t know where exactly you are creating this cloned tensor, but alternatively you could also use other variable names to make sure that the original adv_x isn’t overwritten.

fotinidelig · September 28, 2021, 5:30am

Hm yes indeed but adv_x is updated later in the code and in my algorithm the grad is computed from the start of each iteration, with the new value of adv_x. Meaning I would have to clone a variable there either way (obviously my code before wouldn’t have worked properly, the grad would be wrong :))! Thanks!