# Backpropagation through multiple forward-propagations

Hello everyone.

In studying Unsupervised-Classification , An incomprehensible logic was found.
Here is refer git-hub.

A function scan_train in train_utils.py

``````
"""
Train w/ SCAN-Loss
"""
total_losses = AverageMeter('Total Loss', ':.4e')
consistency_losses = AverageMeter('Consistency Loss', ':.4e')
entropy_losses = AverageMeter('Entropy', ':.4e')
[total_losses, consistency_losses, entropy_losses],
prefix="Epoch: [{}]".format(epoch))

model.eval() # No need to update BN
else:
model.train() # Update BN

# Forward pass
anchors = batch['anchor'].cuda(non_blocking=True)
neighbors = batch['neighbor'].cuda(non_blocking=True)

anchors_features = model(anchors, forward_pass='backbone')
neighbors_features = model(neighbors, forward_pass='backbone')

else: # Calculate gradient for backprop of complete network
anchors_output = model(anchors)
neighbors_output = model(neighbors)

total_loss, consistency_loss, entropy_loss = [], [], []
total_loss.append(total_loss_)
consistency_loss.append(consistency_loss_)
entropy_loss.append(entropy_loss_)

# Register the mean loss and backprop the total loss to cover all subheads
total_losses.update(np.mean([v.item() for v in total_loss]))
consistency_losses.update(np.mean([v.item() for v in consistency_loss]))
entropy_losses.update(np.mean([v.item() for v in entropy_loss]))

total_loss = torch.sum(torch.stack(total_loss, dim=0))

total_loss.backward()
optimizer.step()

if i % 25 == 0:
progress.display(i)
``````

A model is used to calculate two outputs

``````        anchors_output = model(anchors)
neighbors_output = model(neighbors)
``````

and two output is used to get a loss.

``````        total_loss_, consistency_loss_, entropy_loss_ = criterion(anchors_output_subhead,
``````

and back propagation

``````    optimizer.zero_grad()
total_loss.backward()
optimizer.step()
``````

As far as I know, An output value of a node is need to backpropagate gradients through the node.

but in the upper case, the output value is not one.

Do I misunderstood the backpropagation?

I’m not sure what exactly this is referring to. In your code snippet you are reducing the losses by applying `torch.sum` on them and are calling `total_loss.backward()` later.
If you are concerned about the `backward` call on non-scalar values: by default PyTorch will pass a `1.` gradient to `backward()`, if this operation is called on a scalar. If you are working with multiple values, you would need to specify the input gradient manually to the `backward` operation.

Thanks for your replay and sorry to late check.

My question is about a calculating partial derivatives.
Here is an example.

To find the slope of the network, the output values are required.
a1, a2, a3 are output values of training.

This is my question.
The output value is used to obtain the slop.
but the evaluation of the model is calculated twice in the code above.

``````    anchors_output = model(anchors)
neighbors_output = model(neighbors)
``````

What evaluation values ​​are used to find the slope?

Thanks.