Manipulating gradients in backward

mordith · March 26, 2018, 10:40am

A part of my arcitecture is trying to learn to weigh a multinomial distribution.
Below I added the forward function of the layer.

My problem is that for some reason the backward flow doesn’t access this layer’s backward() method so I cannot weigh the gradients as they should be.

I know I can use register_hook to manually fix the gradients of a certain layer but it won’t change the gradients of all the layers that came before it.

Any advice will be appreciated.

Forward layer:

def forward(self, input):
    self.output = Variable(torch.zeros(input.size()))
    self._index = torch.multinomial(input + constants.epsilon, 1, False)
    self.output.scatter_(1, self._index, torch.unsqueeze(self.one.repeat(self._index.size()[0]),1))
    return self._index.float()

jpeg729 · March 26, 2018, 11:22am

According to the docs the hook …

can optionally return a new gradient with respect to input that will be used in place of grad_input in subsequent computations.

In other words, if your hook returns fixed gradients, then those gradients will be used in all layers that come before it.

Need convincing…

a = Variable(torch.randn(2,2), requires_grad=True)
m = nn.Linear(2,1)
m(a).mean().backward()
print(a.grad) 
# shows a 2x2 tensor of non-zero values

def hook(module, grad_input, grad_output):
    # replace gradients with zeros
    return (torch.zeros(grad_input[0].size()),torch.zeros(grad_input[1].size()),torch.zeros(grad_input[2].size()),)

m.register_backward_hook(hook)

a.grad.zero_()
m(a).mean().backward()
print(a.grad)
# shows a 2x2 tensor of zeros

mordith · March 26, 2018, 3:17pm

Thanks for the reply.

I’m still in a bind because the weighing of the gradients depends on the results of the multinomial.
Specifically my backward is:

self.gradInput.resize_as_(input).zero_()
        self.gradInput.copy_(self.output)
        self.gradInput.div_(input)
        self.gradInput.mul_(gradOutput)
        return self.gradInput

And that means the hook will have to have access to the layer itself which gets kinda messy.

Do you have any suggestions as to how to solve this?

jpeg729 · March 26, 2018, 3:25pm

Just to check I have understood you correctly… assuming you are calculating the gradient of some Loss function… your gradient calculation is this, right?

d_Loss/d_input = (output / input) * d_Loss/d_output

The easiest way to do this would be to implement a custom function rather than using a hook. See http://pytorch.org/docs/0.3.1/notes/extending.html

mordith · March 26, 2018, 4:36pm

I’m trying to extend Function as you’ve suggested but I can’t seem to make it work.
It seems that I have two main problems:

I cannot save the output variable using ctx.save_for_backward as I get

RuntimeError: save_for_backward can only save tensors, but argument 1 is of type Variable

Whether I’m saving output or output.data

When I’m only saving the input data the autograd doesn’t enter the backward function when doing backward.

Any ideas where I’m going wrong here?

jpeg729 · March 26, 2018, 5:03pm

Concerning 1. I have no idea.
Concerning 2. it might help to mark the input as dirty using ctx.mark_dirty(input)

mordith · March 27, 2018, 7:39am

For future reference:
The problem with 1 is that whatever is saved with ctx must also be an output of the forward method.

Regarding 2 I will update when I find a solution.
For now it seems your suggestions are the right solution so I’ll mark them as such, but will update my results in the future when I get it to work