When using zero-padding mode in Conv, the paddings are leaf nodes, so the gradient will not back-propagate through the padding to the previous layer.
What about circular or reflect mode? In these two modes, the padding values are the sliced results of the previous layer, so that the gradient can back-propagate to previous layers.
I am not familiar with Cuda, could anyone tell me what the implementation is in Pytorch?