A part of my arcitecture is trying to learn to weigh a multinomial distribution.
Below I added the forward function of the layer.
My problem is that for some reason the backward flow doesn’t access this layer’s backward() method so I cannot weigh the gradients as they should be.
I know I can use register_hook to manually fix the gradients of a certain layer but it won’t change the gradients of all the layers that came before it.
Just to check I have understood you correctly… assuming you are calculating the gradient of some Loss function… your gradient calculation is this, right?
For future reference:
The problem with 1 is that whatever is saved with ctx must also be an output of the forward method.
Regarding 2 I will update when I find a solution.
For now it seems your suggestions are the right solution so I’ll mark them as such, but will update my results in the future when I get it to work