I want to use one of “tourchvision” models in a regression problem where the output is a binary vector of size 64. One option to do this is by replacing the classification layer with a regression layer. As I want to leave the model intact, I am thinking to do it as follows:

...
import torch.nn.functional as F
...
model= get_model('vgg16', num_classes = 64, ....)
...
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
output_as_regressor = F.sigmoid(output, dim=1)
loss += criterion(output_as_regressor, target).item() # criterion = nn.CrossEntropyLoss()
...
...

Should I use binary_cross_entropy_with_logits or binary_cross_entropy instead of CrossEntropyLoss?

Haven’t tried it yet and not sure if I am missing something; and, would appreciate hearing any thoughts on it.

Could you explain a bit about your target?
If you apply F.sigmoid, you should use BCELoss instead of BCEWithLogitsLoss.
Are your targets in the range [0, 1]? The sigmoid makes it quite hard for your model to predict values close to 0 and 1. So you could check MSELoss instead and see, if it’s better suited for your problem.

The target is a binary vector, for example, [1, 0, 0, 0, 1, 0, 1, 1,...,0] (length 64).
I think it is possible to use rounding on the final outputs to force them to either 0 or 1.
So, I will try sigmoid + MSELoss.

I think this is not a problem of using model in torchvision. What you want seems like to use the model to output binary vector, instead of the original output you get from the model directly.

Firstly, you need to have your own method of getting 0 or 1 based on the output of the model, like if the model output [-2.3, 4.4, 2.0, -0.8], you need to specify what binary it should output. Like for conventional softmax method for single-label task, they select the maximum to be 1 and all others 0.

Then, you have your output of the original model, which is not binary, and the target binary value, and (maybe?) you can extend the definition of cross entropy to multi-label task (where the target is not a single 1 binary vector).

Well, if your target is (either 0 or 1) and one sample might have multiple ones, I think it’s a multi-label classification task, but please correct me, if I’m wrong.

If that’s the case, you can apply sigmoid on your output, making it possible to output several classes for each sample, and use BCELoss.

One aim of the question is related to adding a regressor on-top of a classifier model; and if this is possible, it will help a lot of people who are dealing with such problems.
The problem I presented is one example. For a binary vector of size 512, there are 2^512 different/possible states depending on the binary entries, right?

I will use sigmoid + BCELoss and see how it goes.

@deJQK using sigmoid, the output will be bounded in the range [0, 1]. A simple rounding, thus, will do.