Thanks Tom for your response. Also, our sincere apologies for not responding to your post earlier.
Before continuing on my previous thread, let me clarify that our custom loss function does contain few nondifferentiable operations, e.g. histogram creation, and counting values greater than a threshold. We were hoping that Pytorch Autograd can automatically generate approximate derivatives for these operations.
Let me give you an overview of our attempts for both of these and request your valuable insights into whether these workarounds have any chance of working with Pytorch Autograd or any other Pytorch compatible library.
Operation 1: Counting values greater than a threshold.
We need the count of values which are greater than a threshold per row, We have tried to come up with a workaround but we think probably it may not work based on your reply.
The workaround :
Diff = 1-In
Required_Indexes = In>=th
NotRequired_Indexes = In<th
In[NotRequired_Indexes] = 0 #Forcing the values to 0 which are less than threshold, gradients won't flow for these values but that might be ok if gradients of other values can flow in this situation
Diff[NotRequired_Indexes] = 0
In = In + Diff
Out = torch.sum(In,1) # Desired Output
Should this work? Or there is a better and clean way to do it? Or, any such workarounds are doomed to fail with Pytorch Autograd atleast?
Operation 2: Histogram .
We need to take weighted sum of histogram of a tensor. As histogram is not differentiable the workaround we have used is :
We already know the range of values in the tensor (named OutTensor below):
New_empty_A # Tensor of same size as OutTensor
New_empty_B # Tensor of same size as OutTensor
Weights # Contains Weights of each bin at it's location
# For example, if OutTensor can have only 3 values: 0,1,2 then:
OutTensor = OutTensor + 1 #Shift the values by 1 first to get rid of 0s
Weights = [0,weight_0/1,weight_1/2,weight_2/3]
for UniqueValue in OutTensor: # Run a loop through all possible values of OutTensor
New_empty_A [OutTensor==UniqueValues] = Weights[UniqueValue]
New_empty_B = New_empty_A*OutTensor # What we expect is we have already divided weight by it's corresponding value so after this operation in New_empty_B there will be actual weights and we can simply sum it.
Final_Out = torch.sum(New_empty_B)
Does this workaround make any sense with the mechanics of Autograd?
I’m not sure I completely understand the snippets. But so generally, weighted sums would work, but counts don’t. One thing you could do - if you have a view where the mass should move from a given histogram bucket - is fake a gradient in that direction.
For histograms, it could also work to use KDE (kernel density estimation) or so instead or use the Wasserstein distance to a target or somesuch.
You could write your own autograd.Function that computes the histogram from the data in the forward. In the backward you return e.g. grad_output * negative_sign_of_the_direction_you_want_things_to_move.