I need to modify the vanilla implementation of the backward function and change the number of samples processed per layer during the backward propagation. Hence, I would like to know where I can find implementation of the backward pass that takes place layer by layer. I searched my installation and found that most of the heavylifting is being done inside Variable._execution_engine.run_backward(), but for some reason, I am not able to see the implementation of this method.
Do you really need to dig into the underlying code or would your use case be also possible by using backward hooks?
Unfortunately, I suspect that will not suffice. I need to modify the way pytorch is computing these gradients in the graph for a given mini-batch of data.
What I mean is, I would ideally want to buffer up some gradients (corresponding to data in a mini-batch) right before it back propagates those through layer i. Hence, just like the forward pass has been explicitly exposed (meaning that you can see how the data is flowing through the layers), I would ideally like to see the backward pass too.
Let me try to explain myself a bit further. I want to perform training (backward and forward pass) using different batch sizes for every layer. I hence need to change the implementation of backward().
Moreover, I have come to understand that it is common to reduce (average out) the losses at the last layer for all samples in a mini-batch and then back-propagate the single value. I would hence like to know if the back-propagation at intermediate layers also takes place in the same manner? Also, do we (if yes, can I see the code) store the intermediate activations of all samples at all layers, or are they reduced as well?