Gradients with Argmax in PyTorch

Do you have a pointer to the model in different frameworks? The derivative of argmax is zero nearly everywhere, so it doesn’t seem likely that you can back-propagate through it in a way that is useful.