Essentially, I have two layers, L1 and L2 ( independent of each other ), while updating the weights of L1, I want to

use the gradient of loss with respect to weights of L2 also. One way I found of doing this is updating all the weights manually, but since in my actual code I have a lot of layers, this is very inefficient and time consuming. Is there any other way?

What do you mean by “use the gradient of loss with respect to weights of L2 also”? Do the weights of L2 have the same shape of L1 (and do you want to use the gradients of the weights of L2 *instead of* the gradients of the weights of L1 to optimize)?

L1 and L2 have the same shape, I want to sum up the gradients of L1 and L2 to optimize layers L1 and L2.

Okay. You could manually set the .grad attribute on your parameters and then call an optimizer.