Double propagation issue

Salman_Abbasi · January 11, 2024, 9:29pm

Hi

I have a network(dnn_Linear) output(v3). I first like to estimate a variables ss, which is Jacobian of this network w.r.t. to its parameters. Then I would like to backpropagate the loss, and later mutate gradients by multiplying with ss.

Now i get the error:
Trying to backward through the graph a second time"

The part of script is:
ss = torch.autograd.grad(v3,dnn_Linear.parameters(), grad_outputs=torch.ones_like(v3),retain_graph=None, create_graph=False, only_inputs=True, allow_unused=True)#jacobian
Total_loss.backward(retain_graph=True)#back propagate
v3.grad = torch.tensor(ss).T*v3.grad #mutate part

How can I achieve this?

Thanks
Sal

ptrblck · January 11, 2024, 9:31pm

Use retain_graph=True in the torch.autograd.grad call and remove it from the backward call assuming you don’t want to backpropagate another time.

Salman_Abbasi · January 11, 2024, 10:30pm

Thanks. But i do need to backpropage another time. Yes. this is solved. Thanks.

However for the last part
v3.grad = ss.Tv3.grad #mutate part
I get the following error:
"tuple’ object has no attribute 'T*"

What is the meaningful values of ss in this tuple?

Thanks

ptrblck · January 11, 2024, 10:43pm

torch.autograd.grad returns the sum of gradients of outputs with respect to the inputs as a tuple.
If a single input is passed, the tuple will contain a single element:

model = nn.Linear(10, 10)
x = torch.randn(1, 10)
out = model(x)

grads = torch.autograd.grad(out.mean(), model.bias)
grads
# (tensor([0.1000, 0.1000, 0.1000, 0.1000, 0.1000, 0.1000, 0.1000, 0.1000, 0.1000,
#          0.1000]),)

Salman_Abbasi · January 11, 2024, 10:47pm

Thanks. Not sure if it is relevant to this thread. The ss tuple has dimensions 16,3,3,3 and 16.

I want to multiply ss with the dnn_Linear.parameters(), before optimizer.step()

Thanks

ptrblck · January 11, 2024, 11:01pm

The grads tuple will correspond to the same order dnn_Linear.parameters() is returning the parameters, so you could iterate both and apply your update rule.

Salman_Abbasi · January 11, 2024, 11:07pm

Thanks. It would help if there is an example of ss multiplying with dnn.Linear.parameters(). ss is tuple of size 2. Does that mean dnn.Linear.parameters() is also a tuple of same size? If yes, how can we multiply both before optimizer.step()

ptrblck · January 11, 2024, 11:33pm

Something like this would work:

model = nn.Linear(10, 10)
x = torch.randn(1, 10)
out = model(x)

grads = torch.autograd.grad(out.mean(), model.parameters())

for grad, param in zip(grads, model.parameters()):
    with torch.no_grad():
        param.sub_(grad)

but you won’t be able to use an optimizer anymore, since you are now manually manipulating the parameters, making the forward activations stale.

Salman_Abbasi · January 12, 2024, 12:23am

Thanks. I wonder what sub_ does here. What does param.sub_(grad) does?

Can I not do optimizer.step() after this? If not, how the learning rate would come into play?

Salman_Abbasi · January 12, 2024, 7:01pm

Can you please explain more this step. thanks

Salman_Abbasi · January 12, 2024, 8:00pm

@ptrblck , thanks for your help. The only thing, I would like to add here, is that I wanted to multiply, and yet double propagate.
so my correct script after your help is
Total_loss.backward(retain_graph=True)#back propagate
for grad, param in zip(grads, model.parameters()):

    pp=param*grad # i am retaining graph, so inplace operation would not work
    param=pp