I wrote a train routine that takes in an arbitrary model, data loader, and loss criterion, and which contains the following code:
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
output = model(data)
loss = criterion(output, target)*datasize
output is a 2D-tensor and
target is a 1D tensor. The problem is that when using NLL loss as a criterion (say for MNIST classification), everything works fine as is, but if I use BCE loss (for some binary classification), Torch complains that the criterion requires both tensors to be of the same shape.
Since, for binary classification,
output will be a 2D-tensor with size (B, 1) where B is the batch size, calling a simple
squeeze() would be enough. But I want my code to work in both cases, i.e. BCE loss and NLL loss. I already tried to use
squeeze() on the return value of the
forward( ) pass inside my model, but this did not work either. What is the usual way to do this?
Yes, for BCE loss the
target tensor need to be of the same shape. See
Please note that BCE loss is appropriate for a multi-class multi-label setting while NLLLoss is appropriate for a multi-class single-label setting.
So, according to me only one of these shall be the appropriate one for the problem at hand. Is there anything that I am missing?
Could you elaborate on your task and why you’d like to use both?
Well, I would like to write a routine that works on supervised classification tasks. I don’t understand what you mean by “multi-label”. If I have N classes, I have N labels, one for each class.
For binary classification (2 classes and labels 0 or 1), I need to use BCE. For something like MNIST (N classes, N labels with N>2), I need to use NLL.
Is there really no way of handling this How does one write general training routines then that work with different criterions?
You could always add conditions and transform the outputs and targets to the desired
dtype if needed. Usually, you don’t care about these kind of abstractions, since your use case defines the actual criterion and allowing other loss functions to work with your data often doesn’t make sense.
A multi-label multi-class problem is the one where where there are more than two (multiple) classes and a data point can belong to more than one classes at a time.
BCEWithLogitsLoss (with no
softmax() ) is the right loss function for such tasks.
A single-label multi-class problem is the one where there are more than two (multiple) classes and a data point can belong to only one class.
CrossEntropyLoss is the right loss function for such tasks.