Multi-label regression with target tensor as A) normalized class counts or just B) class counts?

I want to perform the multi-label regression. The idea is to just detect the presence of class objects in an input image. Each 224x224 sized image contains multiple objects from a total of six different classes. The label is a normalized count for each class i.e. class_i_instances_per_image / sum(all_instances_per_image) = 3/12 = 0.25 where i ∈ [0, 5]

target = tensor([[0.0588, 0.8235, 0.0882, 0.0000, 0.0000, 0.0294]], device='cuda:0', dtype=torch.float64)

Output from the last FC layer:

tensor([[-0.0649,  0.0636, -0.0299, -0.0651, -0.8145, -0.3030]], device='cuda:0', grad_fn=<AddmmBackward>)

Prediction from the F.softmax() :

output = tensor([[0.1841, 0.2093, 0.1906, 0.1840, 0.0870, 0.1451]], device='cuda:0', grad_fn=<SoftmaxBackward>)

Computation of loss using the above tensors results into:

criterion = torch.nn.CrossEntropyLoss().cuda()
loss = criterion(torch.nn.functional.softmax(output), target)

RuntimeError: Expected object of scalar type Long but got scalar type Double for argument #2 ‘target’ in call to _thnn_nll_loss_forward

The suitable loss functions for regression problems are mean square error (MSE) or mean absolute error (MAE) . But using F.mse_loss() takes much time and returns inf values.

I will highly appreciate your comments/suggestions to make use of these soft-labels.

1 Like

Hi Bran!

Could you clarify your use case?

Let’s say you have classes A through F. Let me focus on class C.
Do you wish your network to tell you just whether you have any
instances of class C in an image? That is “no” means no instances
of class C, while “yes” means you have at least one instance of C,
but you don’'t care how many (as long as it’s not zero).

Or do you want your network to give you a quantitative measure
of how many instances of class C appear in your image? That is,
do you want a different network output if you have four instances
of C than if you have one?

Best.

K. Frank

I would like my network to provide me the quantitative measure. It could be the probability for each present class OR the estimated class count.

Hi Bran!

My intuition would be to use MSELoss on the class counts.

To me this seems to make more sense than the class “probabilities”
(normalized class counts). Let’s say a sample image has one
instance of class B and one of class C, but your network only
detects class B. Your actual normalized class count is 1/2 for
class B (and class C, as well), but your predicted value is 1.

So your network got class B right – one instance – but is being
penalized because the normalized class count is wrong. It seems
to me that you want to give your network “full credit” for correctly
detecting exactly one instance of B (while penalizing it, of course,
for missing the instance of C).

Mean-squared-error makes sense because if, let’s say, you have
4 instances of C, detecting 3 or 5 is wrong, but not as wrong as
detecting 2 or 6. (Mean-absolute-error would also work this way;
I don’t have intuition about which would be better.)

Note, if you use normalized class counts you are throwing away
information that might be helpful in training your network. An image
with three instances of B and three of C has the same normalized
class counts as an image with one of each. But rewarding your
network for correctly detecting all of the B and C instances would
seem to be making better use of your training data.

One comment on cross-entropy: If you want to go the probability
route, it is true that cross-entropy is a measure of how much two
probability distributions deviate from one another. But pytorch’s
built-in CrossEntropyLoss won’t work in this case because it
only handles the special case where the targets are integer class
labels – that is, where the target probability distribution is all 0.0’s
except for one 1.0. If you want to use cross-entropy for this case
you would have to write your own “soft-label” cross-entropy loss
function.

Good luck.

K. Frank

Thanks for the valuable explanation!

I gave MSELoss a try using the actual class counts as below:

criterion = torch.nn.MSELoss()
loss = criterion(output, target)

where

output = tensor([[-0.0272, -0.1516, -0.1283,  0.0954,  0.1588, -0.2674]], device='cuda:0', grad_fn=<AddmmBackward>)
target = tensor([[3., 0., 0., 0., 0., 1.]], device='cuda:0')
loss = tensor(1.8074, device='cuda:0', grad_fn=<MseLossBackward>)

The returned loss value after a complete epoch is nan probably because there was an order of magnitude difference in the output and target tensors. The output was obtained from a pre-trained VGG19 with the relu activations in linear layers (i.e. classifier part). What options do I have to get this network working?

Hi Bran!

The immediate problem is that you are actually using BCELoss
(binary cross-entropy), rather than MSELoss (mean-squared-error)
as your loss function.

BCELoss is not conceptually appropriate for your use class-count
use case. Also, technically, it requires its target values to be
probabilities – numbers between 0.0 and 1.0 – for its computations
to make any sense, and your class-count targets are often greater
than 1.

Best.

K. Frank

Sorry for the mistake! I’ve edited my recent reply and included another example here.

loss = torch.nn.MSELoss(output, target)

The above statement returns a high loss value. For instance the loss is 169.3941 for the following input tensors:
output = [13.7210, 1.6992, -0.1286, -0.9545, -0.9148, 2.3547], and target = [ 0., 0., 0., 0., 14., 1.])

How may I get a non-negative integer output from the network?

Hi Bran!

Something fishy is going on here …

This incorrectly constructs an instance of the MSELoss class, but
doesn’t actually calculate the value of the loss.

You want MSELoss() (output, target)
(or mse_loss (output, target)).

The value you quote for the loss, 169.3941, seems to be off by 100.0.
I get 69.3945.

Here is a pytorch version 0.3.0 script that illustrates these two issues:

import torch
torch.__version__

output = torch.autograd.Variable (torch.FloatTensor ([13.7210, 1.6992, -0.1286, -0.9545, -0.9148, 2.3547]))
target = torch.autograd.Variable (torch.FloatTensor ([ 0., 0., 0., 0., 14., 1.]))
torch.nn.MSELoss (output, target)               # construct a bogus instance of the class MSELoss
torch.nn.MSELoss() (output, target)             # construct a legitimate instance of the class MSELoss and apply it to (output, target)
torch.nn.functional.mse_loss (output, target)   # apply the function mse_loss() to (output, target)
((output - target)**2).mean()                   # compute mean-squared-error directly, just to check
169.3941 - ((output - target)**2).mean()        # your result is off by 100

Here is the output:

>>> import torch
>>> torch.__version__
'0.3.0b0+591e73e'
>>>
>>> output = torch.autograd.Variable (torch.FloatTensor ([13.7210, 1.6992, -0.1286, -0.9545, -0.9148, 2.3547]))
>>> target = torch.autograd.Variable (torch.FloatTensor ([ 0., 0., 0., 0., 14., 1.]))
>>> torch.nn.MSELoss (output, target)               # construct a bogus instance of the class MSELoss
MSELoss(
)
>>> torch.nn.MSELoss() (output, target)             # construct a legitimate instance of the class MSELoss and apply it to (output, target)
Variable containing:
 69.3945
[torch.FloatTensor of size 1]

>>> torch.nn.functional.mse_loss (output, target)   # apply the function mse_loss() to (output, target)
Variable containing:
 69.3945
[torch.FloatTensor of size 1]

>>> ((output - target)**2).mean()                   # compute mean-squared-error directly, just to check
Variable containing:
 69.3945
[torch.FloatTensor of size 1]

>>> 169.3941 - ((output - target)**2).mean()        # your result is off by 100
Variable containing:
 99.9996
[torch.FloatTensor of size 1]

Good luck.

K. Frank

Thanks for your reply. I’m creating an instance of the MSELoss class and calling it later (as I mentioned in the original post)

but I simply mentioned it as loss = torch.nn.MSELoss(output, target) <-- this staement will cause this error: RuntimeError: bool value of Tensor with more than one value is ambiguous.


I noticed the value returned by MSELoss function was not an average of individual losses of examples in the batch. For instance, considering the batch size = 2, the loss was different from the average of the two losses when batch size was 1 while inputs/ targets were kept the same. Which of these should be given more importance?

a) loss calculated per image (for batch size =1)
b) loss calculated per batch (for batch size > 1)
c) mean loss per epoch (irrespective of the batch size)


Hi Bran!

Could you post a small, self-contained, runnable script, together with
its complete output, that illustrates this issue? Don’t include a model
or anything – just hard wire the output and target tensors that you
pass to your loss function.

Best.

K. Frank

Thanks, Frank! :+1:

While preparing an example for you, I found some possible reasons for having different losses.


Brief context: I removed the nn.BatchNorm2d() layers to fine-tune a pretrained VGG19 but kept the nn.Dropout(p=0.5) layers (is this strategy fine for fine-tuning?).


At first, I thought the presence of these dropout layers following the nn.Linear layers altered the output every time hence I got different losses. So, I have to perform another experiment just to verify this. Following is what was done next

  1. ̶R̶e̶m̶o̶v̶e̶d̶ ̶B̶a̶t̶c̶h̶N̶o̶r̶m̶(̶)̶ ̶l̶a̶y̶e̶r̶s̶
  2. Removed all nn.Dropout() layers
  3. Set a constant seed value for torch/Cuda random number generator
  4. Initialized the weight/bias of all three linear layers (in the classifier part)

To my surprise, the network still generated different outputs for the same inputs for batch size = 1, 2, and 4. The results are tabularized below. Can you please share the needed wisdom to understand this?


The green color indicates the same output for the relevant inputs.

Now, coming back to an older question.