I am trying to understand the behavior of max pooling during forward and backward propagation (especially backward). Unfortunately, I have trouble making sense of the
I would really appreciate if you explain what is stored during the forward pass (if anything) and how such information is used during the backward pass. A short snippet of code in Python would be very much appreciated.
The max pooling layer uses the maximum value out of all the ones in the kernel. So the gradient is 1 for the selected value and 0 for all the others.
So during the backward, the gradient of the output is (multiplied by 1 and) set to the selected value. All the others are set to 0.
To do so, it uses the indices returned by the forward pass.
The cpu implementation is here with gradInput initialized to zeros.
Thank you for your answer.
The only question that remains is how such indices help when training in batches? My understanding is that the total loss value at the output is the mean of individual loss values for each sample in the minibatch.
With your explanation, it sounds like we need to backpropagate the loss value for each individual sample in the minibatch so that we can decide which indices should have a zero gradient and which ones should be multiplied by one.
P.S. In the source code, the name of the function includes
single_out_frame, which might mean that in fact, the samples are processed one at a time during the backward pass.
You have one set of indices per element in the batch.
As you noted, the function I linked only processes one sample, the for loop that runs through the batches is just below here.