Double propagation issue


I have a network(dnn_Linear) output(v3). I first like to estimate a variables ss, which is Jacobian of this network w.r.t. to its parameters. Then I would like to backpropagate the loss, and later mutate gradients by multiplying with ss.

Now i get the error:
Trying to backward through the graph a second time"

The part of script is:
ss = torch.autograd.grad(v3,dnn_Linear.parameters(), grad_outputs=torch.ones_like(v3),retain_graph=None, create_graph=False, only_inputs=True, allow_unused=True)#jacobian
Total_loss.backward(retain_graph=True)#back propagate
v3.grad = torch.tensor(ss).T*v3.grad #mutate part

How can I achieve this?


Use retain_graph=True in the torch.autograd.grad call and remove it from the backward call assuming you don’t want to backpropagate another time.

Thanks. But i do need to backpropage another time. Yes. this is solved. Thanks.

However for the last part
v3.grad = ss.Tv3.grad #mutate part
I get the following error:
"tuple’ object has no attribute 'T

What is the meaningful values of ss in this tuple?


torch.autograd.grad returns the sum of gradients of outputs with respect to the inputs as a tuple.
If a single input is passed, the tuple will contain a single element:

model = nn.Linear(10, 10)
x = torch.randn(1, 10)
out = model(x)

grads = torch.autograd.grad(out.mean(), model.bias)
# (tensor([0.1000, 0.1000, 0.1000, 0.1000, 0.1000, 0.1000, 0.1000, 0.1000, 0.1000,
#          0.1000]),)
1 Like

Thanks. Not sure if it is relevant to this thread. The ss tuple has dimensions 16,3,3,3 and 16.

I want to multiply ss with the dnn_Linear.parameters(), before optimizer.step()


The grads tuple will correspond to the same order dnn_Linear.parameters() is returning the parameters, so you could iterate both and apply your update rule.

Thanks. It would help if there is an example of ss multiplying with dnn.Linear.parameters(). ss is tuple of size 2. Does that mean dnn.Linear.parameters() is also a tuple of same size? If yes, how can we multiply both before optimizer.step()

Something like this would work:

model = nn.Linear(10, 10)
x = torch.randn(1, 10)
out = model(x)

grads = torch.autograd.grad(out.mean(), model.parameters())

for grad, param in zip(grads, model.parameters()):
    with torch.no_grad():

but you won’t be able to use an optimizer anymore, since you are now manually manipulating the parameters, making the forward activations stale.

Thanks. I wonder what sub_ does here. What does param.sub_(grad) does?

Can I not do optimizer.step() after this? If not, how the learning rate would come into play?

Can you please explain more this step. thanks

@ptrblck , thanks for your help. The only thing, I would like to add here, is that I wanted to multiply, and yet double propagate.
so my correct script after your help is
Total_loss.backward(retain_graph=True)#back propagate
for grad, param in zip(grads, model.parameters()):

    pp=param*grad # i am retaining graph, so inplace operation would not work