Unexpected behaviour from trying to control model saliency

I have a model where the human user during training gets to select parts of the image that he doesnt seem important, but they still have been included in the saliency map. Then multiply the selected pixels with the gradient and backpropagate the result, hoping to make the model not take those parts of the image into account.

The code of that part looks like this:

Then I check the gradients created, and zero the weights which have two high of a gradient (took the marked pixels into account). I get that Im overfitting here but even for one image, the next saliency map often times doesnt take out the marked regions, it rather returns a weird reverse saliency map, where every image part that was salient now isnt and vice versa. Any explanations as to why and how I can achieve what I want?

Thanks a lot in advance

Don’t necessarily need people who know the answer, also any ideas are very much welcome

ok so i realized it doesnt properly backpropagate because it doesnt manage to backpropagate to the marked pixels, how can i make it so only the gradients of the marked pixel go into the loss

Do you have more code illustrating the problem? Look at only this snippet, it should backprop toward the marked_pixels if they require grad.

1 Like

sure I have two possible reasons why it doesnt backpropagate.

  1. it actually doesnt backpropagate, and here is the code part where I try to do it:

The other way is I make a mistake while checking if i actually backpropagate to self.marked_pixels, but the way i do it is I create two losses, one with and one without marked_pixels, and check if there are differences with the gradients which there arent.

Also only backpropagating to the current gradients would explain why the saliency map gets inverted, you punish the weights that encourage current gradients and encourage the other ones. Maybe it doesnt backpropagate because marked pixels arent a continuous function? If so, is there maybe a way to still only add parts of the input gradients to the loss?

Edit: Maybe I go back to calculating the hessian for only the marked pixels?