Gradients from hook triggered by backward and autograd are different

Hi, recently I noticed that when catching the gradients from a hook (through the grad_out parameter) that is triggered by a backward() call do not have requires_grad = True nor do they have a grad_fn attribute. However, when I trigger the same hook with autograd, then the values of grad_out caught from the hook are the same, yet they do have requires_grad = True and have a grad_fn attribute. I’m curious towards why this is happening? Thank you so much.

Here is my toy example for reference:

import torch
torch.manual_seed(0)

# define embedding and linear layers
embedding_layer = torch.nn.Embedding(10, 5, padding_idx=0)
fc = torch.nn.Linear(5, 6)

random_ix = torch.randint(high=10, size=(5,))

embedding_list = []
def hook(module, grad_in, grad_out):
    print("grad out", grad_out[0]) # if triggered through autograd, this will have a grad_fn and requires_grad equal to True, otherwise not
    embedding_list.append(grad_out[0])

# register hook on embedding layer
embedding_layer.register_backward_hook(hook)

# do forward pass 
embeds = embedding_layer(random_ix)
output = fc(embeds)
print("output", output)
merged = torch.sum(output, dim=1)
summed = merged.sum()
print(summed)

# trigger hook through autograd
grad_auto = torch.autograd.grad(summed, embedding_layer.weight, create_graph=True)
print("grad auto", grad_auto[0][random_ix])

# trigger hook through backward() call 
summed.backward()

maybe if we use

summed.backward(create_graph=True)

then we will get same results

1 Like

The requires_grad=True field is set if you are create_graph=True.
In the grad call, if you just want to be able to backward again, you should use retain_graph=True.

1 Like

This worked. Thanks!

That makes sense, thanks!