Loss.backward() for two different nets

mbehzad · September 14, 2020, 12:16pm

Hi, let’s say I have two networks, “net1” and “net2” with “loss1” and “loss2” representing the loss function of “net1” and “net2”, and “optimizer1” and “optimizer2” are the optimizers of both networks.

My losses are computed as:
loss1 = criterion(outputs1, labels1)
loss2 = criterion(outputs2, labels2)

Now, I want to backprop the loss to net1 and net2. I use:
loss1.backward()
loss2.backward()
optimizer1.step()
optimizer2.step()

Is this a correct way? I am just confused about how is loss1 associated with net1 for computing gradients, and loss2 with net2? I know that the loss1 is computed from outputs of net1, but I want to ask how are the gradients in “loss1.backward()” computed from “net1”.

My goal: compute the loss1 from net1 and just backprop to net1, and same for net2.

albanD · September 14, 2020, 3:43pm

Hi,

The trick we use is to store with the Tensor enough information to know how it was created (to be able to compute the gradients).
So the loss1 “knows” that it was computed based on the parameters from net1 and so when you call .backward, it will compute the gradients wrt to these parameters. (whatever you do on the side with other nets).

David_Alford · September 14, 2020, 6:39pm

PyTorch comes with a component called autograd which provides automatic differentiation for all operations on Tensors, and Tensors which can remember where they “came from”.

From the PyTorch docs:

torch.Tensor is the central class of the package. If you set its attribute .requires_grad as True , it starts to track all operations on it. When you finish your computation you can call .backward() and have all the gradients computed automatically. The gradient for this tensor will be accumulated into .grad attribute.

When a Neural Network is defined in PyTorch it uses the base class of Torch.nn.Module. All your submodules and layers can be initialized in the module - which will lead to them being tracked by the Module.

Let us say Net1 looked like this (subclassing nn.Module)

import torch.nn as nn
import torch.nn.functional as F

class Net1(nn.Module):
    def __init__(self):
        super(Net1, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Then when you call the model you get an output:

output = Net1(x)

This output has been propagated through the forward() pass of Net1 (among other methods). Then you calculate the loss:

loss1 = criterion(outputs1, labels1)

Now we call the .backward() method on the optimizer, autograd will backpropogate through the tensors which have requires_grad set to True and calculate the gradient w.r.t the parameters all the way back to where they came from.

Then when you call optimizer1.step() it will look into params.grad and update the value of the params by subtracting the learning_rate times the grad from it.

The key here is that Tensors know where they came from and each Net is backpropogated automatically - so both your Nets know exactly where they came from. This abstracts it away from the user which makes it quite friendly

Note: I am newer to PyTorch so if I explain something wrong someone please feel free to correct me.

David Alford

mbehzad · September 17, 2020, 8:33am

Thanks to both of you for your responses, it makes sense now.