How to implement the combined loss (Supervised + Unsupervised)?

John1231983 · July 6, 2018, 7:08am

Hello all, I have a loss function as

loss = loss1 + 0.1 * loss2

where loss1 and loss2 are CrossEntropyLoss. The loss1 has two inputs are outputs from network and ground-truth labeled, called Supervised Loss, while the loss2 takes two inputs as outputs and labeled (just threshold the outputs), called Unsupervised Loss. They are balanced by the weight 0.1. This is my implementation

optimizer.zero_grad()
###############
#Loss1: given images and labels
###############
criterion = nn.CrossEntropyLoss().to(device)
outputs = model(images)
loss1 = criterion(outputs, labels)  
loss1.backward()
###############
#Loss2: given images
###############
outputs = model(images)
labels = outputs>0.5
_, labels = torch.max(outputs, 1)*labels
loss2 = 0.1*criterion(outputs, labels)  
loss2.backward()
optimizer.step()

Could you look at my implementation and give me the comments for two thing:

Is the implementation correct to perform loss=loss1+0.1*loss2?
Does the optimizer.step() and optimizer.zero_grad() will apply in end of the script or after each loss backward function?

For the second point, I mean

optimizer.zero_grad()
###############
#Loss1: given images and labels
###############
criterion = nn.CrossEntropyLoss().to(device)
outputs = model(images)
loss1 = criterion(outputs, labels)  
###############
#Loss2: given images
###############
outputs = model(images)
labels = outputs>0.5
_, labels = torch.max(outputs, 1)*labels
loss2 = 0.1*criterion(outputs, labels)  
loss = loss1+0.1*loss2
loss.backward()
optimizer.step()

Thanks in advance!

viraat · July 21, 2018, 3:03pm

If you look at this post: How to combine multiple criterions to a loss function? - #16 by ElleryL
code from your second point seems to be in line with what has been said above. From the tutorials, when you call backward() on the loss it does the following:

The backward function receives the gradient of the output Tensors with respect to some scalar value, and computes the gradient of the input Tensors with respect to that same scalar value.

When you call step() for optimiser it does the following:

Calling the step function on an Optimizer makes an update to its parameters