MultiLabelMarginLoss max operation

i see that the MultiLabelMarginLoss loss function has a max operation:

loss(x, y) = sum_ij(max(0, 1 - (x[y[j]] - x[i]))) / x.size(0)

how is it possible when max operation isnt derivative?

I believe this thread answers your question (max is an identity function for the maximum element).
Confused about torch.max() and gradient

1 Like