Add all the loss

WangBo12 · March 9, 2019, 2:10pm

hi, i am a beginner of pytorch.
The problem is how to add all the loss which is iterate on the whole dataset.
Some of the code shown below may explain my problem clearly.
Thanks.

# my network
class MyNet(nn.Module):
    ...
    def forward(self, input):
        ...
        return a, b
net = MyNet()

# my dataloader for my own dataset
train_loader = Dataloader(
                       dataset=train_data, 
                       shuffle=True,
                       batch_size=1)

# my training 
# the optimizer and the criterion has been defined
for epoch in range(num_epochs):
    running_loss = 0.0
    optimizer.zero_grad()
    for i, data in enumerate(train_loader, 0):
        inputs, labels = data
        outputs_a, outputs_b = net(inputs)
        loss_a = criterion(outputs_a, labels)
        loss_b = criterion(outputs_b, labels)
        running_loss  = running_loss + loss_a.item() + loss_b.item()
    running_loss.backward()
    optimizer.step()

i know there must be something wrong with my running_loss. And what i want to do is ,for one epoch, add all the loss of one iteration over the data set and do the backward.
And is the batch_size = 1 correct or not? And how to define the running_loss. In some post, i saw loss.data[0] should be add together?

Thanks.

ptrblck · March 9, 2019, 2:26pm

If you want to call backward() on running_loss, you should just add the losses together without calling .item() on them.
You would need to call item() to detach the loss from the computation graph (no backward possible anymore) and store them for debugging purposes, e.g. printing.
In your case however, you need the computation graphs.
Another approch would be to call (loss_a + loss_b).backward() inside the inner loop. This should yield the same results as the gradients are accumulated. Have a look at this post for more information.

WangBo12 · March 10, 2019, 7:42am

That post helps me a lot. But i am still a little confused.
First, is it means that if the batch_size of train_loader = 1, i can write the code like this? because of accumulated gradient as you said?

criterion = nn.CrossEntropyLoss(）
for epoch in range(num_epochs):
    optimizer.zero_grad()
    for i, data in enumerate(train_loader, 0):
        inputs, labels = data
        output_a, outputs_b = net(inputs)
        loss_a = criterion(outputs_a, labels)
        loss_b = criterion(outputs_b, labels)
        (loss_a + loss_b).backward()
    optimizer.step()

Second, if the batch_size of the train_loader more than 1, e.g.16. i just change the criterion like this, and will get the same result?

criterion = nn.CrossEntropyLoss(size_average=False, reduction='sum')

Loss_total = $\sum_{i=1}^{N}(loss_a_i + loss_b_i)$. is waht i want to minimize.
N is the number of examples in my dataset.
Thanks~

imaluengo · March 10, 2019, 9:30am

To the first question, yes you can. Everytime you call loss.backward() it accumulates gradients in the network. They actually only get processed when you call optimizer.step(), at the end of the loop.

You can only do this since your optimizer.zero_grad() is outside of the loop, once per epoch. If zero_grad was done in the inner loop, once per batch which is (arguably) more common in docs and examples, then gradients would be reseted to zero after every batch.

To the second question: again yes. If you sum losses instead of averaging, your code will yield same results regardless of your batch size.

WangBo12 · March 10, 2019, 11:43am

Thank you! I learned a lot.

ptrblck · March 10, 2019, 4:00pm

As @imaluengo explained, both approaches should yield the same results.
However, you would have to be careful, if your model contains any nn.BatchNorm layers, as the running estimates will be different using these approaches.