Why need backward() in Function?

From the doc about Function 's forward() , it says “Keep in mind that only Variable s will be passed in here” (http://pytorch.org/docs/0.1.12/notes/extending.html), and after reading many posts about create custom function/loss, I have the conclusion that operations on Variables will be able to get gradients through autograd, then I am wondering why it still requires backward() implementation for Functions if inputs to forward() are Variables rather than Tensors (I understand Module’s forward() accepts Variables so no backward() needed )? I think I must be missing something but would like to know what is that, thanks.

Just debug a bit using gradcheck(), what’s finally passed to the forward() is tensor, not Variable. And in some other docs it says “It can take and return an arbitrary number of tensors.” (http://pytorch.org/docs/master/autograd.html) I guess what is happening underlying is inside Function core they are all tensors, but outside of Function PyTorch convert input Variables to tensors and wraps them back into output Variables, in order to build computation graph. Somehow it is a bit confusing to new users, and I am still not sure if I am right about it.

My mistake, previously I read the doc of 0.1.12, now in the doc of 0.2.0 it confirms my above thinking. I was misguided by this old non-official doc: http://pytorch-zh.readthedocs.io/en/latest/notes/extending.html