i see that the MultiLabelMarginLoss loss function has a max operation:
loss(x, y) = sum_ij(max(0, 1 - (x[y[j]] - x[i]))) / x.size(0)
how is it possible when max operation isnt derivative?
i see that the MultiLabelMarginLoss loss function has a max operation:
loss(x, y) = sum_ij(max(0, 1 - (x[y[j]] - x[i]))) / x.size(0)
how is it possible when max operation isnt derivative?
I believe this thread answers your question (max
is an identity function for the maximum element).
Confused about torch.max() and gradient