A 0-1 loss function would not be (usefully) differentiable. That is, after backpropagation,
the gradients computed for a model’s parameters would all be zero (hence giving the
optimizer no information about how to modify those parameters to reduce the loss).
This if-else construct is not differentiable. Setting requires_grad = True for the
return value doesn’t fix this.
(You could write a soft loss that is close to one when input and target are nearly
equal, but is close to zero when input and target differ significantly.)
@KFrank Thanks. I was going to write an answer to my own question and say something similar: that this function is not differentiable, so is useless for backprop.