Are they equivalent?

``````a = ...
loss1 = net(a)

b = ...
loss2 = net(b)

c = loss1+loss2
c.backward()
optimizer.step()
``````
``````a = ...
loss1 = net(a)
loss1.backward()

b = ...
loss2 = net(b)
loss2.backward()
optimizer.step()
``````

Hi,

I think you made a mistake in your first part. I assume you wanted to put c = loss1 + loss2. If so, it is the same behaviour. Indeed, .backard() sum the gradients while you don’t call optimizer.zero_grad() (or net).

So to be clear, the first part you did : grad(loss1 + loss2)

but the gradient is a linear operator so it is equal. You can chek the code below with just import torch. I modified a pytorch example. Both backward print the same outputs.

Cheers.

import torch

device = torch.device(‘cpu’)

# H is hidden dimension; D_out is output dimension.

N, D_in, H, D_out = 3, 6, 6, 2

# Create random Tensors to hold input and outputs

x = torch.randn(N, D_in, device=device)
a = torch.randn(N, D_in, device=device)
y = torch.randn(N, D_out, device=device)
b = torch.randn(N, D_out, device=device)

# want to compute gradients for these Tensors during the backward pass.

w1 = torch.randn(D_in, H, device=device, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, requires_grad=True)

learning_rate = 1e-6
for t in range(3):
# Forward pass: compute predicted y using operations on Tensors. Since w1 and
# w2 have requires_grad=True, operations involving these Tensors will cause
# PyTorch to build a computational graph, allowing automatic computation of
# gradients. Since we are no longer implementing the backward pass by hand we
# don’t need to keep references to intermediate values.
y_pred = x.mm(w1).clamp(min=0).mm(w2)
b_pred = a.mm(w1).clamp(min=0).mm(w2)

``````# Compute and print loss. Loss is a Tensor of shape (), and loss.item()
# is a Python number giving its value.
loss1 = (y_pred - y).pow(2).sum()
loss2 = (b_pred - b).pow(2).sum()
print(t, loss1.item())
print(t, loss2.item())

# Use autograd to compute the backward pass. This call will compute the
# of the loss with respect to w1 and w2 respectively.
loss1.backward(retain_graph=True)
loss2.backward(retain_graph=True)

# Update weights using gradient descent. For this step we just want to mutate
# the values of w1 and w2 in-place; we don't want to build up a computational
# graph for the update steps, so we use the torch.no_grad() context manager
# to prevent PyTorch from building a computational graph for the updates