Can anyone please explain with a simple example what is the exact meaning of grad_input and grad_output. I tired to understand from documentations but I couldn’t grasp it
grad_output
is the gradient coming from the output of the module during the backward pass while grad_input
is the gradient which will be passed to the corresponding input of the module during the backward pass.
Here is a small example using nn.ReLU()
:
m = nn.ReLU()
m.register_full_backward_hook(lambda module, grad_input, grad_output: print("grad_input: {}\ngrad_output: {}".format(grad_input, grad_output)))
x = torch.randn(2, 2, requires_grad=True)
print(x)
# tensor([[-0.5412, -0.2550],
# [-1.5957, 0.2068]], requires_grad=True)
out = m(x)
print(out)
# tensor([[0.0000, 0.0000],
# [0.0000, 0.2068]], grad_fn=<BackwardHookFunctionBackward>)
out.backward(gradient=torch.ones_like(out)*2.)
# grad_input: (tensor([[0., 0.],
# [0., 2.]]),)
# grad_output: (tensor([[2., 2.],
# [2., 2.]]),)
print(x.grad)
# tensor([[0., 0.],
# [0., 2.]])
As you can see grad_output
corresponds to the gradient I’m passing to backward
while grad_input
will be returned and assigned to x
as it’s the input of the module.
Thanks a lot.
This makes sense to me now.
If I need these grad_out from the hooks in the main training loop?
How should I get those after the backward call
You can use backward hooks to access these gradients as seen in my code snippet.