I had to implement my own loss function which specified custom gradients as well by normalizing some regularization term’s gradients (this case the non-informative prior for geodesic distance + L1 loss of the in-plane rotation) accordingly to the softmax conditioned outputs.
You can specify such custom forward and backwards procedure using PyTorch Functions or Modules.
http://pytorch.org/docs/master/notes/extending.html demonstrates how you can attain a gradient input and manipulate it in this case to replicate the backwards procedure shown in the Caffe code.
If your loss simply requires functional differentiation, then you can just create a nn.Module and have the auto-diff handle it for you :).
An example of it is available in some of my bundle of code here for a structured Mahalanobis metric loss.