Why Pytorch can calculate the gradient of the loss function with quantiles?

Hi Guys

I am doing a project related to neural networks recently.

I was surprised to find that pytorch can calculate the gradient of loss function with quantiles, because the quantile calculation should be non differentiable.

I was asked to elaborate the exact principle of it, why Pytorch can calculate the gradient of loss function with quantiles?

The following is a screenshot of my customized loss function.

Can anyone answer my question?

1 Like

The implementation of torch.quantile should be defined here and based on the used operations, I don’t see a reason why it should not be differentiable.
It seems, internally a sorting and interpolation is used, which are both differentiable in PyTorch (in torch.sort only the sorted values with have a backward function, not the indices) so could you explain why it should not work?