Thresholding Operation in pytorch with gradients

While Implementing a custom loss function There is requirement to threshold a tensor and it is necessary for the gradients to flow while .backward() pass with autograd.

I have a tensor of shape (N,7) , need to find out for each row that how many values are greater than a threshold (th), finally I need a tensor of shape (N,1).

Toy Example :

 In = [[1,2,3,4,5,6,7],
th = 5

Out = [[2],[2],[3]]

Currently the problem is that while directly trying to threshold the gradients are vanishing.

So thresholding (setting small values to 0) is different to counting the number of exceeding values. Which do you need?

Also, counting values has a discrete result (1, 2, 3). An infinitesimal change to a value (usually) doesn’t change how many are above this threshold, so the function isn’t differentiable…

Best regards


Thanks Tom for your response. Also, our sincere apologies for not responding to your post earlier.

Before continuing on my previous thread, let me clarify that our custom loss function does contain few nondifferentiable operations, e.g. histogram creation, and counting values greater than a threshold. We were hoping that Pytorch Autograd can automatically generate approximate derivatives for these operations.
Let me give you an overview of our attempts for both of these and request your valuable insights into whether these workarounds have any chance of working with Pytorch Autograd or any other Pytorch compatible library.

Operation 1: Counting values greater than a threshold.

We need the count of values which are greater than a threshold per row, We have tried to come up with a workaround but we think probably it may not work based on your reply.
The workaround :

Diff = 1-In
Required_Indexes          = In>=th
NotRequired_Indexes       = In<th
In[NotRequired_Indexes]   = 0 #Forcing the values to 0 which are less than threshold, gradients won't flow for these values but that might be ok if gradients of other values can flow in this situation
Diff[NotRequired_Indexes] = 0
In = In + Diff
Out = torch.sum(In,1) # Desired Output

Should this work? Or there is a better and clean way to do it? Or, any such workarounds are doomed to fail with Pytorch Autograd atleast?

Operation 2: Histogram .

We need to take weighted sum of histogram of a tensor. As histogram is not differentiable the workaround we have used is :

We already know the range of values in the tensor (named OutTensor below):

New_empty_A # Tensor of same size as OutTensor
New_empty_B # Tensor of same size as OutTensor
Weights # Contains Weights of each bin at it's location 
# For example, if OutTensor can have only 3 values: 0,1,2 then: 
OutTensor = OutTensor + 1 #Shift the values by 1 first to get rid of 0s
Weights = [0,weight_0/1,weight_1/2,weight_2/3]

for UniqueValue in OutTensor: # Run a loop through all possible values of OutTensor
  New_empty_A [OutTensor==UniqueValues] = Weights[UniqueValue]
New_empty_B  = New_empty_A*OutTensor # What we expect is we have already divided weight by it's corresponding value so after this operation in New_empty_B there will be actual weights and we can simply sum it.
Final_Out = torch.sum(New_empty_B)

Does this workaround make any sense with the mechanics of Autograd?

I’m not sure I completely understand the snippets. But so generally, weighted sums would work, but counts don’t. One thing you could do - if you have a view where the mass should move from a given histogram bucket - is fake a gradient in that direction.
For histograms, it could also work to use KDE (kernel density estimation) or so instead or use the Wasserstein distance to a target or somesuch.

Best regards


Yes we want the mass to move from left to right, can you please elaborate on how to achieve this.

You could write your own autograd.Function that computes the histogram from the data in the forward. In the backward you return e.g. grad_output * negative_sign_of_the_direction_you_want_things_to_move.

Best regards


For such operations (counting values exceeding a threshold, weighted sum of histogram) can we train a Neural Network which can approximate the desired behavior and also being compatible with autograd?

Also we have svd computation in the pipeline, Do we need to include that operation also in our NN approach or that will be differentiable by default (couldn’t find exact info in documentation) ?