I saw some pytorch models were using nearest neighbor interpolation in their upsampling layers (in convolutional decoders for instance). But to my knowledge, the nearest interpolation is not differentiable. Shouldnt the model not being able to learn anything because of that ?
Or does the backward propagation works similarly to a pooling layer in that case ?