# Loss.backward() for two different nets

Hi, let’s say I have two networks, “net1” and “net2” with “loss1” and “loss2” representing the loss function of “net1” and “net2”, and “optimizer1” and “optimizer2” are the optimizers of both networks.

My losses are computed as:
loss1 = criterion(outputs1, labels1)
loss2 = criterion(outputs2, labels2)

Now, I want to backprop the loss to net1 and net2. I use:
loss1.backward()
loss2.backward()
optimizer1.step()
optimizer2.step()

Is this a correct way? I am just confused about how is loss1 associated with net1 for computing gradients, and loss2 with net2? I know that the loss1 is computed from outputs of net1, but I want to ask how are the gradients in “loss1.backward()” computed from “net1”.

My goal: compute the loss1 from net1 and just backprop to net1, and same for net2.

Hi,

The trick we use is to store with the Tensor enough information to know how it was created (to be able to compute the gradients).
So the `loss1` “knows” that it was computed based on the parameters from net1 and so when you call .backward, it will compute the gradients wrt to these parameters. (whatever you do on the side with other nets).

1 Like

PyTorch comes with a component called autograd which provides automatic differentiation for all operations on Tensors, and Tensors which can remember where they “came from”.

From the PyTorch docs:

`torch.Tensor` is the central class of the package. If you set its attribute `.requires_grad` as `True` , it starts to track all operations on it. When you finish your computation you can call `.backward()` and have all the gradients computed automatically. The gradient for this tensor will be accumulated into `.grad` attribute.

When a Neural Network is defined in PyTorch it uses the base class of `Torch.nn.Module`. All your submodules and layers can be initialized in the module - which will lead to them being tracked by the `Module`.

Let us say `Net1` looked like this (subclassing `nn.Module`)

``````import torch.nn as nn
import torch.nn.functional as F

class Net1(nn.Module):
def __init__(self):
super(Net1, self).__init__()
self.conv1 = nn.Conv2d(1, 20, 5)
self.conv2 = nn.Conv2d(20, 20, 5)

def forward(self, x):
x = F.relu(self.conv1(x))
return F.relu(self.conv2(x))
``````

Then when you call the model you get an output:

``````output = Net1(x)
``````

This output has been propagated through the `forward()` pass of `Net1` (among other methods). Then you calculate the loss:

``````loss1 = criterion(outputs1, labels1)
``````

Now we call the `.backward()` method on the optimizer, autograd will backpropogate through the `tensor`s which have `requires_grad` set to `True` and calculate the gradient w.r.t the parameters all the way back to where they came from.

Then when you call `optimizer1.step()` it will look into `params.grad` and update the value of the `params` by subtracting the `learning_rate` times the `grad` from it.

The key here is that Tensors know where they came from and each Net is backpropogated automatically - so both your Nets know exactly where they came from. This abstracts it away from the user which makes it quite friendly

Note: I am newer to PyTorch so if I explain something wrong someone please feel free to correct me.

David Alford

2 Likes

Thanks to both of you for your responses, it makes sense now.