Earlier you mentioned that total 10 classes are there. Why are you casting output into size of 9? What is the shape of target yb? can you please tell me the shape of y_hat coming out from model and yb originally ? (without any change in shape )
In your code what is shape of yhat in the line for given xb :
yhat = model(xb)
and what is the shape of yb for given xb?
tell me only these two things!
Seen the issue , model is giving an output of [batch, class prob] and target is giving an output of a 1D tensor
just solved this
used nn.CrossEntropyLoss() instead of F.cross_entropy and passed ignore_index=9, reduction=‘mean’ as arguments before computing the loss
I assume you are using nn.NLLLoss as the criterion, since you are applying log_softmax as the last activation function.
Have you had a look at the solutions provided in this thread, i.e. check the target shape and make sure it’s [batch_size] without additional dimensions for a mutli-class classification use case?
If so, could you print the model output shape as well as the target shape?
PS: you can post code snippets by wrapping them into three backticks ```
i do like criterion(outputs, torch.max(labels, 1)[1]), it works. However, criterion(outputs, labels.squeeze()), it will cause another problem "RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate(handle)". Anyway, Thank you a lot!
Using nn.CrossEntropyLoss or nn.NLLLoss there is no point in creating one-hot encoded targets.
^ I ended up switching to an ordinally encoded label based on this.
However, I still have num_classes==num_neurons in my output layer because:
You get softmax (one sigmoid per class) predictions aka probabilities for each class.
More neurons allows for more parameters/ edges aka more information coming into the output layer. You’ve probably got many neurons in your last hidden layer and they would struggle to jam info about all of the labels into 1 neuron. It lets the activations spread out?
Output activations are closer to 1. Less chance of exploding gradient?
Also, don’t forget that you can write your own loss function. This would allow you to swap in the keras class you rlly want.
The model output tensor is expected to have the shape [batch_size, nb_classes], so using an output neuron for each class (in a multi-class or multi-label classification use case) is the right approach. The targets should only be passed as class indices (not one-hot encoded) to nn.CrossEntropyLoss or nn.NLLLoss.
hi @cerkauskas, can you plz tell me what this torch.max(labels, 1)[1] is actually doing? When I print my target dimension , it is exactly batch size * 1, but still, it keeps throwing this error. Thank you!
Here is some code to give you an intuition for the function and its outputs.
import torch
batch_size = 100
num_choices = 5
dummy_outputs = torch.rand(batch_size, num_choices)
print(torch.max(dummy_outputs, dim=1)[0]) #print the max values along dim=1
print(torch.max(dummy_outputs, dim=1)[1]) #print the max indices along dim=1
You will also need to consider what your loss function needs as an input. What kind of loss function are you using?