I have three simple question.
- What will happen if my custom loss function is not differentiable? Will pytorch through error or do something else?
- If I declare a loss variable in my custom function which will represent the final loss of the model, should I put
requires_grad = True
for that variable? or it doesn’t matter? If it doesn’t matter, then why? - I have seen people sometimes write a separate layer and compute the loss in the
forward
function. Which approach is preferable, writing a function or a layer? Why?
I have seen relevant post here before but I want to have a clear, nice explanation to resolve my confusions. Please help.