If I create a function and I apply it by calling forward method, the gradient computed seems independent of my backward() method and seems correct, even if the backward() was incorrect.
For example, with this code:
class Cube(Function):
def forward(self,input):
self.save_for_backward(input)
return input*input*input
def backward(self, grad_output):
input, = self.saved_tensors
# wrong backward function:
return grad_output
cube = Cube()
input = Variable(torch.ones(2,2).double(), requires_grad=True)
output = cube(input).sum()
output.backward()
print(input.grad) # gives [[1,1],[1,1]] what does my backward do
input.grad.data.zero_()
output = cube.forward(input).sum()
output.backward()
print(input.grad) # gives [[3,3],[3,3]] the good gradient ?!