PyTorch's backward implementation of cuda min/max functions?


I’m looking to understand PyTorch’s backward pass implementation for min(), max(), minimum(), and maximum() on CUDA tensors. The forward pass is straightforward, but the design of backwards seems tricky. I’ve searched aten/src/ATen/native/cuda but didn’t see any file related to the corresponding backwards. If anyone can point me toward where the backward logic for these CUDA ops is implemented, I’d appreciate it.

Hi! The backward implementations for min/minimum and max/maximum are found in pytorch/tools/autograd/derivatives.yaml, which is a file containing explicit backward functions.

The backward function for max and min call into evenly_distribute_backward which is defined in FunctionsManual.cpp. The backward for maximum and minimum are defined in the DSL explained in the big comment on the top of the the derivatives.yaml file. Note that both these methods end up using other ATen ops like at::where. masked_fill, *, /, and sum. You had specifically asked for the implementation for CUDA tensors–and so you’ll have to look into how those ATen ops are implemented in CUDA by checking in native_functions.yaml. A lot of them end up using our TensorIterator (blog for more details), which is a performant way of bunching pointwise compute on CUDA.

Additionally, you may find our wiki about autograd basics useful for gaining more background if desired.