The problem is that since the output is being done to multiple labels at the same time, one has to only backprop to a specific number of parameters, this is done by assigning specific weights to each gradient, with many of them being 0 (lines 139 and 177-187 in file linked above). Is there a way to implement a similar fuction in PyTorch ? if so, any advice on how to do this ? Thank you!
I had to implement my own loss function which specified custom gradients as well by normalizing some regularization term’s gradients (this case the non-informative prior for geodesic distance + L1 loss of the in-plane rotation) accordingly to the softmax conditioned outputs.
You can specify such custom forward and backwards procedure using PyTorch Functions or Modules.