How does pytorch implements backward process?

qiuyang · August 15, 2019, 6:33am

I know we can write loss.backward() to finish grad calculation.But when I read source code about backward,I am confused.
I add some print information in pytorch python source code to trace the backward process.
Basically loss.backward() will call torch.autograd.init.py backward function and get into Variable._execution_engine.run_backward function.This is a cpp function.When we use DataParallel,the source code finally will call torch.nn.parallel.Gather.backward function and torch.nn.parallel.Broadcast.backward function.How are these two class called?
In addition,Gather.backward funtion will call Scatter.apply function, and Scatter.apply function will call Scatter.forward function.How does it woks?
The Scatter class inherits from torch.autograd.Function class.Function class inherits from a python metaclass.The metaclass bulid a class that the class inherits from BackwardCFunction class.
So in my opinion,if we call Scatter.apply function,the Scatter.backward function will be called.But why not?

albanD · August 15, 2019, 9:54am

Because there is a static apply function that is called when you do Scatter.apply() and a non-static one that is called on an instance of scatter: Scatter().apply().

qiuyang · August 15, 2019, 10:09am

Is the static apply function from _C._FunctionBase class?
I mean that it is from THPFunction_apply?Thanks for your reply!

albanD · August 15, 2019, 10:14am

Yes, and it’s defined here as a static method.

qiuyang · August 15, 2019, 10:21am

Interesting!And I have a final question.Could you tell me where does the non static apply function called?

albanD · August 15, 2019, 10:32am

Sure.
In the engine, when a new task can be executed here.
It first does some checking and input/output preparation here.
It then does all the required hooks, the use the operator() function on the actual Node (previously called Function) here.
This is routed to the main Node class here.
Which calls the apply function of Node, which is a PyNode in your case as it is implemented in python.
And the PyNode gets and calls the .apply attribute from the class instance here.

Looking at it this way, there are a few levels of indirection indeed

qiuyang · August 15, 2019, 11:19am

I understood!Thanks a lot!