def forward(self, x):
...
return(output0, output1, output2)
output0, output1, output2 = model(inputs)
_, preds = torch.max(output2.data, 1)
loss0 = criterion(output0, labels)
loss1 = criterion(output1, labels)
loss2 = criterion(output2, labels)
loss = loss2 + 0.3 * loss0 + 0.3 * loss1
loss.backward()
optimizer.step()
output2 is the prediction of the final classifier, and the other two outputs are the ones of auxiliary classifiers. The overall
As you can see, I made predictions through the output2.
“At inference time, these auxiliary networks are discarded.” as the paper said.