I want to create a new criterion as a black box (which uses numpy). Since autograd cannot compute the gradient, I need to define both the forward() and backward() functions. The new criterion gets the output of the network; then, together with some other values, it computes both the loss and the gradients to the input. Now,

Do I need to inherit from a module in Pytorch?

Do I need to iterate a for-loop to compute the loss for each sample within a batch and then compute the average loss?

Thank you so much for the hint. I implemented a function as below. loss_and_gradients is a numpy function which computes both the loss and gradients w.r.t to input. Once the function is called in the forward path, it computes the gradients as well. Do I have to call loss_and_gradients function in the backward path again? I need the gradient only w.r.t to input, and not the other input values (i.e. imgL, cost, imgD).

No, u don’t have to call it again, u only have to make sure backward returns the gradients. This being so, u don’t have to save .loss to self. Also, the number of arguments of backward and its number of returns should be equal to the number of outputs of forward.