Hi, recently I noticed that when catching the gradients from a hook (through the grad_out parameter) that is triggered by a backward() call do not have requires_grad = True nor do they have a grad_fn attribute. However, when I trigger the same hook with autograd, then the values of grad_out caught from the hook are the same, yet they do have requires_grad = True and have a grad_fn attribute. I’m curious towards why this is happening? Thank you so much.
Here is my toy example for reference:
import torch
torch.manual_seed(0)
# define embedding and linear layers
embedding_layer = torch.nn.Embedding(10, 5, padding_idx=0)
fc = torch.nn.Linear(5, 6)
random_ix = torch.randint(high=10, size=(5,))
embedding_list = []
def hook(module, grad_in, grad_out):
print("grad out", grad_out[0]) # if triggered through autograd, this will have a grad_fn and requires_grad equal to True, otherwise not
embedding_list.append(grad_out[0])
# register hook on embedding layer
embedding_layer.register_backward_hook(hook)
# do forward pass
embeds = embedding_layer(random_ix)
output = fc(embeds)
print("output", output)
merged = torch.sum(output, dim=1)
summed = merged.sum()
print(summed)
# trigger hook through autograd
grad_auto = torch.autograd.grad(summed, embedding_layer.weight, create_graph=True)
print("grad auto", grad_auto[0][random_ix])
# trigger hook through backward() call
summed.backward()