In trying to implement a custom loss function would it better to:
Create a python function that takes in a tensor and calculates loss
OR
Implement by inheriting from the nn.Module class
Also since an activation function does not have any parameters as it is not a ‘layer’ how can we make sure that the gradients are passed through if we use method 2.
I had tried both methods and found that method 2 prevented passing of gradients. But using method 1 there is no problem with backprop gradient passing.
While implementing using inheritance from nn.Module, for some reason the gradients weren’t flowing backwards. Is there any particular reason for this? Do we need to declare any variables in the constructor for it to work? (Since activations take input and pass it through non-linearity, I didn’t declare any variables in the constructor. I had only written the forward() function for it. Could this be a reason for no gradients being calculated?)