RuntimeError: multi-target not supported at /Users/soumith/code/builder/wheel/pytorch-src/aten/src/THNN/generic/ClassNLLCriterion.c:21

How to map 1-D “y” labels (0-4) vector to logits output from NN for a Loss function?

Hello,

I am new to PyTorch and ML. After reading many forums and materials, I cannot find the best way to get a solution.

I have following INPUTS:

train file stored in h5 format with following labels:
[y] - ground truth labels (integers 0,1,2,3,4) - isolated as separate tensor for training purposes (Y), through pandas DF
[x1:x120] - feature vectors x1-x120, which gives 120 features as input vector, isolated as well for training (X1-X120 tensor coming from pandas DF)
around 45000 training instances (rows)

So I have constructed fc NN like this:
L1 - [120-100] nodes for input vectors
L2 - [100-50]
L3 - [50-20]
L4 - [20-5] nodes for outputs, either logits or probabilities after applying softmax

Each layer being activated through ReLU, no softmax at the end in 1 version, just the output itself (logits, right?)

class Net(nn.Module):

def __init__(self):
    super(Net, self).__init__()
    # an affine operation: y = Wx + b
    self.fc1 = nn.Linear(120, 100)
    self.fc2 = nn.Linear(100, 50)
    self.fc3 = nn.Linear(50, 20)
    self.fc4 = nn.Linear(20, 5)

def forward(self, x):
    #x = x.view(-1, 120)
    out1 = F.relu(self.fc1(x))
    out2 = F.relu(self.fc2(out1))
    out3 = F.relu(self.fc3(out2))
    y_pred = self.fc4(out3)
    return F.log_softmax(y_pred)
    #return y_pred

model = Net()
print(model)

What I am struggling with, is how to convert either logits or 1-D y labels array to match dimensions for CrossEntropyLoss or even NLL when applied Softmax?

It gets runtime error - multi target not supported etc… (1D vs 5D)
Shall I convert ground truth labels to one-hot coding, having some array like:
y
[00100] = 2
[01000] = 1
[00001] = 4
???

My Data Loader:

dataset = TrainDataset()
train_loader = DataLoader(dataset=dataset,
batch_size=batch_size,
shuffle=True,
num_workers=2)
My Training loop:
///
criterion = torch.nn.NLLLoss()
#criterion = torch.nn.CrossEntropyLoss()

for epoch in range(num_epochs):
for batch_idx, (x_train, y_train) in enumerate(train_loader):
x_train, y_train = Variable(x_train), Variable(y_train)
opt.zero_grad()
output = model(x_train)
print(output)
#y_train = y_train.squeeze_()
loss = criterion(output, y_train)
loss.backward()
opt.step()
print(‘Epoch %d: Loss %.5lf’ % (epoch, loss))
///

I tried to squeeze_ the y labels, but it does not make sense and I get same values over iterations…

I am really lost :(( Can you help?

nn.NLLLoss expects your model to output a tensor of the shape [batch_size, nb_classes] and a target of [batch_size] containing the class indices for a usual multi-class classification use case.

If you are squeezing y_train, which shape does it have afterwards and what do you mean by

?

Hi ptrblack,

thank you for your quick answer.

I am getting the shape of [batch_size, nb_classes], which is (10,5). The problem is with y_train (10,1).

Pls disregard for now this line:
“and I get same values over iterations” - I assume firstly the training has to start working, because the loss seems to get unreal values, thus the loss seems more less the same over 10 epochs and the results do not get better.

Anyway, when I use the squeezed y_train, I get (batch size 10, 5 classes):
tensor([2, 2, 0, 4, 1, 3, 2, 1, 1, 1])
torch.Size([10])

I tried eventually to adjust the logits after applying sigmoid into same shape by applying:
output = model(x_train)
output = torch.argmax(output, dim=-1)
and I get strange values:
tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
torch.Size([10])
Having same shapes finally, I tried to apply BCELoss, bot no success.

But my wish is to get the crossEntropy working at last, I do not know if I have to convert my y_train vector to one-hot encoding somehow for cross-entropy to see it properly?

On the other hand, I have read somewhere, that NLL recognizes 1D vector of labels and matches it with output layer (after applying logSoftmax, 5 classes as ouput), is it true? It does not work for me either…

Pls let me know if something is not clear.

Cheers,
Marek

Yes, that should work as explained:

You should also apply F.log_softmax on your output, but this already seems to be the case.

Why does it not work for you? Do you get any error?
After squeezing the target, the shape looks alright.

I have such shape now of input and target as before:

tensor([4, 4, 4, 4, 4, 4, 4, 4, 4, 4])
torch.Size([10]) #this is input
tensor([4, 3, 0, 0, 1, 0, 2, 1, 3, 2])
torch.Size([10]) #this is target

I apply log_softmax to my output logits indeed:
///
y_pred = self.fc4(out3)
return F.log_softmax(y_pred)

…and transform input, target as follows in training loop:
input:
output = model(x_train)
output = torch.argmax(output, dim=-1)
target:
y_train = y_train.squeeze_(1)
y_train = torch.transpose(y_train, 0, -1)

what I get, is following error now:

The model output should be [batch_size, nb_classes], so [10, 5] in your case, with F.log_softmax(x, dim=1) applied on the logits.

You shouldn’t call torch.argmax on the outputs.

PS: This shouldn’t be an error, but you should either use

  • the inplace method of squeeze: y_train.squeeze_(1)
  • or the “vanilla” method and assign the result: y_train = y_train.squeeze(1)

(Methods with an underscore work inplace)

When I use y_train.unsqueeze_(1), I am still getting same error:
RuntimeError: multi-target not supported at /Users/soumith/code/builder/wheel/pytorch-src/aten/src/THNN/generic/ClassNLLCriterion.c:21

But when I try:
y_train = y_train.squeeze_(1)

The algorithm goes on further finally, but I am getting strange loss values, e.g.:

Train Epoch: 3 [43100/45324 (95%)] Loss: 1.242024
Train Epoch: 3 [43200/45324 (95%)] Loss: 0.432863
Train Epoch: 3 [43300/45324 (96%)] Loss: 0.725757
Train Epoch: 3 [43400/45324 (96%)] Loss: 0.514305
Train Epoch: 3 [43500/45324 (96%)] Loss: 0.267429
Train Epoch: 3 [43600/45324 (96%)] Loss: 0.490571
Train Epoch: 3 [43700/45324 (96%)] Loss: 0.444628
Train Epoch: 3 [43800/45324 (97%)] Loss: 0.134719
Train Epoch: 3 [43900/45324 (97%)] Loss: 0.142233
Train Epoch: 3 [44000/45324 (97%)] Loss: 0.333635

It’s NLL after aplying log_softmax.

Is it normal that the loss jumps like this from like 0.13 ~ 1.24 in the training loop?
I would expect the model to be greadually improving to reach like ~0.9 accuracy
(of course against test set which I have not done yet)

Sorry, that was a typo and you should use squeeze instead.

It might happen sometimes and it also depends on the batch size you are using. Larger batch sizes yield a smoother loss curve generally.

I got the code working finally, thank you!

The only problem is, the loss seems to be still terrible.

I am trying different learning rates = 0.01, 0.05, 0.001, optimizer SGD, Adam or Adagrad, CrossEntropy or NLL loss. Epochs = 5 or 10.

I am not sure if my NN works properly, the loss seems to get very strange values. What could be a realistic loss for such a multi-class classification problem? I am never lower than 0.5 for different sizes of batches (I tried 10, 50, 100, 1000), average is around 0.9:

Train Epoch: 4 [44500/45324 (98%)] Loss: 1.194766
Train Epoch: 4 [44600/45324 (98%)] Loss: 0.656055
Train Epoch: 4 [44700/45324 (99%)] Loss: 0.755115
Train Epoch: 4 [44800/45324 (99%)] Loss: 0.929295
Train Epoch: 4 [44900/45324 (99%)] Loss: 0.823785
Train Epoch: 4 [45000/45324 (99%)] Loss: 1.219415
Train Epoch: 4 [45100/45324 (99%)] Loss: 0.845087
Train Epoch: 4 [45200/45324 (100%)] Loss: 0.934597
Train Epoch: 4 [45300/45324 (100%)] Loss: 0.878224

///

When I submitted the results for our project, I got a very low baseline ( I do not have ground truth labels unfortunately, to evaluate the model on test set.

My evaluation loop is as follows in this case:
///
model.eval()
y = []
pred = []

for i, (x_test) in enumerate(test_loader):
x_test = Variable(x_test)
output = model(x_test)
output = torch.max(output.data,1)[1]
y = output.numpy()[0]
pred.append(y)
///

Would you mind having a look and shed some light to that problem?

Thanks again!

Are you dealing with an imbalanced dataset?
If so, you could add a weight argument to your loss function or use a WeightedRandomSampler so that your model does not overfit on a specific class.

Also, try to overfit your model on a very small dataset (e.g. just 10 samples) and make sure the loss decreases towards zero.
If that’s not the case, you might have a bug in your code or the model architecture/hyperparameters are not suitable for this use case.