# Inherit from Function
# Note that both forward and backward are @staticmethods
# bias is an optional argument
def forward(ctx, input, weight, bias=None):
ctx.save_for_backward(input, weight, bias)
output = input.mm(weight.t())
if bias is not None:
output += bias.unsqueeze(0).expand_as(output)
# This function has only a single output, so it gets only one gradient
def backward(ctx, grad_output):
# This is a pattern that is very convenient - at the top of backward
# unpack saved_tensors and initialize all gradients w.r.t. inputs to
# None. Thanks to the fact that additional trailing Nones are
# ignored, the return statement is simple even when the function has
# optional inputs.
input, weight, bias = ctx.saved_variables
grad_input = grad_weight = grad_bias = None
# These needs_input_grad checks are optional and there only to
# improve efficiency. If you want to make your code simpler, you can
# skip them. Returning gradients for inputs that don't require it is
# not an error.
grad_input = grad_output.mm(weight)
grad_weight = grad_output.t().mm(input)
if bias is not None and ctx.needs_input_grad:
grad_bias = grad_output.sum(0).squeeze(0)
return grad_input, grad_weight, grad_bias
- I am confused why
forward in a tensor. How can
their use will be registered in the graph if the input is just a tensor?
grad_output is a Variable that requires_grad is false.
In my custom autograd, it is quite complex and needs something non-differential operations, so I perform some tricks to make it converge. I am wondering whether the following code is ok.
def backward(ctx, grad_output):
grad_output = grad_output.data
# do something else to get the approximate grad_output
grad_input = Variable(grad_output, requires_grad=False)
I do as you told. However, it gives the error message:
NameError: name 'oncedifferentiable' is not defined.
Sorry, small typo, it is
once_differentiable that you can import with
from torch.autograd.function import once_differentiable.
@once_differentiable on the top of
@staticmethod, the error
TypeError: 'staticmethod' object is not callable occurs.
Swap the two decorators.
You can see here how it is used.
Weird, when I do this, the param
grad_output becomes a Tensor instead of
requires_grad is false.
Ho I forgot it was doing the unpacking for you so that you donc need to deal with
Variables. My bad. I edited my answer above.
grad_input has been turned into a tensor in my custom layer, then it will be transformed to Variables when gets to the former layer?
Yes, as you got a Tensor as input, you should return a Tensor.
I am still a bit confused. For normal autograd operations, since we have return the corresponding gradient such as
grad_input as a Variable. Why do we need to make sure that we have created the proper graph? We can directly assign the
grad_input to the variable
input. It seems it has nothing to do with whether the graph is proper.
If your backward function does not have the
once_differentiable decorator and does not create a proper graph in the backward function, then all higher order derivatives will be wrong. You have helper functions here and here if you want to check your first and second order derivatives implementation with finite differences.
Thank albanD! You really help me!