Hi all,

I’m trying to back prop a model using the gradients acquired in the front and backprop of a copy of the exact same model. I acquire the gradient using a hook:

```
# define the models
class ANet(nn.Module): ...
class BNet(nn.Module):
def __init__(self):
... layers ...
self.grad = None
def forward(self, x):
x.register_hook(lambda g: self.grad = g)
... layers ...
return x
```

Running this model yields the proper gradient:

```
# instantiate models
a1net = ANet()
a2net = deepcopy(a1net)
bnet = BNet()
# forward and back prop a1 and b
X, y = next(iter(train_loader))
yhat = bnet(a1net(X))
loss = F.nll_loss(yhat, y)
loss.backward()
```

Afterwards, I’d like to update `a2net`

using the extracted gradient. I usually just call `.backward()`

to initialize back propagation using Autograd, but now I’ve got to apply the gradient acquired in the previous process.

```
# backprop a2, this is where I'm stuck
grad = bnet.grad
a2net.backward(grad)...?
```

I’ve thought of using `torch.autograd.backward(tensors=a1net, grad_tensors=grad)`

, but that didn’t work.

Comparing my extracted gradient with the result of `a1net(X)`

shows that my grad is lacking the reference to the model in `grad._grad_fn`

, also I wouldn’t know how to create a grad explicitly for a non scalar.

Most related questions I see around here are about retaining graphs, but always about a single model.

Best,