Nll_loss, in crossEntropyloss ,Target -1 is out of bounds error

flipflop · October 10, 2020, 3:09pm

I am pretty new to deep learning and pytorch API , When I try to build a ResNet 50 and train the image attributes with binary (1 or -1) class ,It gives me nll_loss, in crossEntropyloss ,Target -1 is out of bounds error , and I have followed the tutorial that my resNet 50 structure is correctly built ,tested with a random input and it give me the correct output which is tensor size 2.

Here is my code for training -
def train_model(epoch):
model_net.train()
for batch_index,(input,labels) in enumerate(train_loader):
labels=labels.view(batch_size)
input,labels=input.to(device),labels.to(device)
outputs=model_net(input)
print(input.shape)
print(outputs.shape)
print(labels.shape)
loss =lostFunction(outputs,labels)
if batch_index %2==0 or batch_index==len(train_loader)-1:
print(‘epoch {} batch {}/{} loss {:.3f}’.format(
epoch, batch_index, len(train_loader)-1, loss.item()))
optimizer.zero_grad() # Set gradients to zero
loss.backward() # From the loss we compute the new gradients
optimizer.step()

this is the output ->
torch.Size([20, 3, 218, 178])
torch.Size([20, 2])
torch.Size([20])

IndexError Traceback (most recent call last)
in ()
----> 1 train_model(0)

4 frames
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in nll_loss(input, target, weight, size_average, ignore_index, reduce, reduction)
2216 .format(input.size(0), target.size(0)))
2217 if dim == 2:
-> 2218 ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
2219 elif dim == 4:
2220 ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)

IndexError: Target -1 is out of bounds.

this is the testing data format ->
(tensor([[[-0.6794, -0.7137, -0.8164, …, -1.0048, -1.0048, -0.9705],
[-0.6794, -0.7137, -0.8164, …, -1.0048, -1.0048, -0.9705],
[-0.6794, -0.7137, -0.8164, …, -1.0048, -1.0048, -0.9705],
…,
[ 0.3994, 0.2111, 0.0741, …, 1.6667, 1.6667, 1.4954],
[ 0.2453, 0.2453, 0.2967, …, 1.5639, 1.6838, 1.7694],
[ 0.2624, 0.2624, 0.3652, …, 1.5639, 1.6838, 1.7694]],

    [[-0.6176, -0.6527, -0.7752,  ..., -1.1604, -1.1604, -1.1253],
     [-0.6176, -0.6527, -0.7752,  ..., -1.1604, -1.1604, -1.1253],
     [-0.6176, -0.6527, -0.7752,  ..., -1.1604, -1.1604, -1.1253],
     ...,
     [ 0.2052,  0.0126, -0.4601,  ...,  1.8333,  1.8333,  1.6583],
     [ 0.0476,  0.0476, -0.2500,  ...,  1.7108,  1.8158,  1.9034],
     [ 0.0476,  0.0476, -0.2150,  ...,  1.7108,  1.8158,  1.9034]],

    [[-0.5147, -0.5495, -0.6018,  ..., -1.0550, -1.0550, -1.0201],
     [-0.5147, -0.5495, -0.6018,  ..., -1.0550, -1.0550, -1.0201],
     [-0.5147, -0.5495, -0.6018,  ..., -1.0550, -1.0550, -0.9853],
     ...,
     [ 0.3219,  0.1302, -0.2881,  ...,  2.1868,  2.2217,  2.0474],
     [ 0.1476,  0.1476, -0.1138,  ...,  2.0648,  2.2217,  2.3088],
     [ 0.1476,  0.1476, -0.0615,  ...,  2.0648,  2.2217,  2.3088]]]), tensor([-1]))

I really cant figure out why this happen , I have checked that all the training data ,label tensor ,batch number ,output feature number are all correct , can someone help me to debug?

Caruso · October 10, 2020, 6:01pm

Hi,

CrossEntropyLoss expects your labels to be in range [0, C-1], where C denotes the number of classes - so in your case [0, 1] and not -1 or 1.
PS: If its a single-label binary classification, you can also have a look at BinaryCrossEntropy.

Greetings.

flipflop · October 11, 2020, 1:13am

Thank you ! ,this is a really stupid mistake , another question , since we output tensor with [value1.value2] , which is the prob of binary class , how do we know which value is representing 0 and which value representing 1. I am still a bit confused about this .

KFrank · October 11, 2020, 2:49am

Hi Chriseven!

The output of your model, [value1, value2], means whatever you
trained your model for it to mean.

As I understand it, you have structured your model as a two-class
multi-class classifier. (Your model outputs two values, and you use
CrossEntropyLoss.) Conceptually (although not in implementation)
this is the same as a binary-classification problem. (One output value,
and BCEWithLogitsLoss.)

The following post answers your question, but in the language of a
binary-classification problem:

Best.

K. Frank

flipflop · October 11, 2020, 6:53am

Thank you for your answer Frank .it helps! btw do you have any experience on google colab cuda out og memory issues. I try to train around 150,000 images (7kb each) , it seems I dunt have enough resources on GPU , but train using cpu really really too slow .