Backward hooks changing order of execution in nn.Sequential

I’m working on a synthetic gradient approach and I have this simple model:

model = nn.Sequential(OrderedDict([('lin1', nn.Linear(2, 4)),
                                   ('relu1', nn.ReLU(inplace=True)),
                                   ('ef1', EF.Identity()),
                                   ('lin2', nn.Linear(4, 3)),
                                   ('relu2', nn.ReLU(inplace=True)),
                                   ('ef2', EF.Identity()),
                                   ('lin3', nn.Linear(3, 2))
	                           ]))

EF.Identity is a custom layer that simply copies the input to the output in the forward pass. I use it to register a hook in the backward pass where I overwrite gradInput. I do this to check (by print) that gradInput of ‘ef1’ is equal to gradOutput of ‘relu1’.

model.relu1.register_backward_hook(EF.print_grad)
model.ef1.register_backward_hook(EF.hack_grad)

Then I create dummy input and target variables just to test:

y_hat = model(dummy)
loss_fn = nn.CrossEntropyLoss()
loss = loss_fn(y_hat, target)
loss.backward()

But I notice that in the backward pass, the hook on ‘relu1’ is run before the hook on ‘ef1’. Is this because of some optimization going on that is changing the op graph? If so, is there a way to deactivate this behavior? It’s really important that the backward operations are run following the order for my method.

Well, for now I cheated by adding a dummy multiplication by one in the forward pass, but I’m still curious to hear the answer.

Hi,

Here is a self contained example to show the behaviour.
Removing the clone in the custom identity shows the behaviour that you describe right?

import torch
from torch import nn
from torch.autograd import Variable
from collections import OrderedDict

def print_grad(mod):
    def tmp(*args):
        print("grad for {}".format(mod))
        print(args)
    return tmp

class Identity(nn.Module):
    def forward(self, inp):
        # Remove the following line to get weird behavior
        inp = inp.clone()
        return inp

model = nn.Sequential(OrderedDict([('lin1', nn.Linear(2, 4)),
                                   ('relu1', nn.ReLU(inplace=True)),
                                   ('ef1', Identity()),
                                   ('lin2', nn.Linear(4, 3)),
                                   ('relu2', nn.ReLU(inplace=True)),
                                   ('ef2', Identity()),
                                   ('lin3', nn.Linear(3, 2))
                               ]))

model.relu1.register_backward_hook(print_grad("relu1"))
model.ef1.register_backward_hook(print_grad("ef1"))

dummy = Variable(torch.rand(5, 2))
target = Variable(torch.rand(5).zero_().long())
y_hat = model(dummy)
loss_fn = nn.CrossEntropyLoss()
loss = loss_fn(y_hat, target)
loss.backward()

The reason why this happens is because of the way backward hooks are implemented for nn.Module.
Basically, the hook is done by attaching a hook on the output Variable of the module after the forward pass.
Since in your case the forward pass does not do anything, both hooks will be added to the same Variable during the forward pass. And so the first one added will be called first: the one corresponding to relu1 was added first and so is called first, the one corresponding to ef1 was added after and thus is called after.

Not sure if this is a bug or not…
cc: @apaszke do we want to fix this behaviour? Can we actually do it?

Yes, that’s the behavior I was observing. Thank you very much for your answer.

I’m doing something which is very exotic, I don’t think a lot of other people will be bothered by this.