A part of my arcitecture is trying to learn to weigh a multinomial distribution.
Below I added the forward function of the layer.
My problem is that for some reason the backward flow doesn’t access this layer’s backward() method so I cannot weigh the gradients as they should be.
I know I can use register_hook to manually fix the gradients of a certain layer but it won’t change the gradients of all the layers that came before it.
Any advice will be appreciated.
def forward(self, input):
self.output = Variable(torch.zeros(input.size()))
self._index = torch.multinomial(input + constants.epsilon, 1, False)
self.output.scatter_(1, self._index, torch.unsqueeze(self.one.repeat(self._index.size()),1))
According to the docs the hook …
can optionally return a new gradient with respect to input that will be used in place of
grad_input in subsequent computations.
In other words, if your hook returns fixed gradients, then those gradients will be used in all layers that come before it.
a = Variable(torch.randn(2,2), requires_grad=True)
m = nn.Linear(2,1)
# shows a 2x2 tensor of non-zero values
def hook(module, grad_input, grad_output):
# replace gradients with zeros
# shows a 2x2 tensor of zeros
Thanks for the reply.
I’m still in a bind because the weighing of the gradients depends on the results of the multinomial.
Specifically my backward is:
And that means the hook will have to have access to the layer itself which gets kinda messy.
Do you have any suggestions as to how to solve this?
Just to check I have understood you correctly… assuming you are calculating the gradient of some Loss function… your gradient calculation is this, right?
d_Loss/d_input = (output / input) * d_Loss/d_output
The easiest way to do this would be to implement a custom function rather than using a hook. See http://pytorch.org/docs/0.3.1/notes/extending.html
I’m trying to extend Function as you’ve suggested but I can’t seem to make it work.
It seems that I have two main problems:
- I cannot save the output variable using ctx.save_for_backward as I get
RuntimeError: save_for_backward can only save tensors, but argument 1 is of type Variable
Whether I’m saving output or output.data
- When I’m only saving the input data the autograd doesn’t enter the backward function when doing backward.
Any ideas where I’m going wrong here?
Concerning 1. I have no idea.
Concerning 2. it might help to mark the input as dirty using
For future reference:
The problem with 1 is that whatever is saved with ctx must also be an output of the forward method.
Regarding 2 I will update when I find a solution.
For now it seems your suggestions are the right solution so I’ll mark them as such, but will update my results in the future when I get it to work