m = nn.LogSoftmax(dim=1)
loss = nn.NLLLoss()
# input is of size N x C = 1 x 3
input = torch.tensor([[0,0,0]], requires_grad=True, dtype=torch.float)
# each element in target has to have 0 <= value < C
target = torch.tensor([1])
output = loss(m(input), target)
print(output)
_Output: tensor(1.0986, grad_fn=<NllLossBackward>)_

I read the Documentation but it’s not clear. Can someone explain the math behind this example?

In your example the your output has the same “probability” for all three classes, i.e. the logits have the same value.
Their probability should therefore be approx [0.33, 0.33, 0.33].
Since you are using LogSoftmax we can check, if this is true by calling exp on it (thus getting rid of the log):

You will get the same values every time you pass the same logits into LogSoftmax.
Now we just have to get the right index using target, multiply with -1, and end up with a loss value of 1.0986.

Don’t know if it’s right to post this question here, but I’m trying: Why the output of this piece of code is tensor(-0.3150)? I was expecting to be (-1/2) * ((1 * ln(0.88) + 0 * ln(0.12) + 1 * ln(0.51) + 0 * ln(0.49)), which would be equal to 0.4005, not -0.3150?

Ahh ok, thanks for the answer! What I am trying to figure out actually is how the nn.NLLLoss works for multidimensional tensors, but I couldn’t find an example. Could you give me a simple example on how that loss is calculated for a 2D or 3D tensor?

Hi Frank, so I took a more simple example for trying to understand:

m = nn.LogSoftmax(dim=1)
loss = nn.NLLLoss()
# input is of size N x C = 2 X 2
input = torch.randn(2, 2, requires_grad=True)
# each element in target has to have 0 <= value < C
target = torch.tensor([1, 0])
output = loss(m(input), target)
print(m(input))
print(target)
print(output)

So which is the formula involved in this example for getting the value 0.4428 based on the given input and target, because it’s not really clear for me how l1,…,ln from the l(x, y) formula (official NLLLoss documentation) are calculated since the loss weight is None?

weight (Tensor,optional ) – a manual rescaling weight given to each class. If given, it has to be a Tensor of size C. Otherwise, it is treated as if having all ones.

Optional means that the argument is allowed to be None, i.e, absent.
In such a case there is no reweighting (or, equivalently, the reweighting
factors are all equal to 1).

X, y = data
#Make the zero_grad()
net.zero_grad()
#output of the loss created
input = net(X.view(-1, 784)
#calculating the loss
loss = F.nll_loss(input, y) # calc and grab the loss value
loss.backward() # apply this loss backwards thru the network's parameters
optimizer.step() # attempt to optimize weights to account for loss/gradients

print(loss) # print loss. We hope loss (a measure of wrong-ness) declines!

But this keeps showing error at loss = F.nll_loss(input, y) and the error is SyntaxError, how do I solve it ?
I am new to deep learning, but I have experience with sklearn

What kind of error are you seeing?
Could you post the complete error message with the stack trace here, please?

Often F.nll_loss creates a shape mismatch error, since for a multi-class classification use case the model output is expected to contain log probabilities (applied F.log_softmax as the last activation function on the output) and have the shape [batch_size, nb_classes]. The target should be a LongTensor in the shape [batch_size] and should contain the class indices in the range [0, nb_classes-1].

For a single binary output with labels in [0, 1] you should use nn.BCEWithLogitsLoss instead of nn.NLLLoss.
The output of the model should then be a single value containing the logit and should be passed directly to the criterion without applying sigmoid on it.
You could also use sigmoid and nn.BCELoss but the numerical stability would be worse.

But, how can I explicitly do it here. It has picked up automatically from model?

Also how it calculates to value 0.3405?

model = tr.XLMRobertaForSequenceClassification.from_pretrained("/home/stb/AIML/model_mlm_vocab_exp1_20epocs",problem_type="single_label_classification", num_labels=2,
ignore_mismatched_sizes=True, id2label={0: 'negative', 1: 'positive'})

training_args = tr.TrainingArguments(
#report_to = 'wandb',
output_dir='/home/stb/AIML/results_vocab_ext_exp1', # output directory
overwrite_output_dir = True,
num_train_epochs=10, # total number of training epochs
per_device_train_batch_size=10, # batch size per device during training
per_device_eval_batch_size=10, # batch size for evaluation
learning_rate=2e-5,
warmup_steps=200, # number of warmup steps for learning rate scheduler
weight_decay=0.01, # strength of weight decay
logging_dir='./logs_exp1', # directory for storing logs
logging_steps=6000,
evaluation_strategy="epoch"
,save_strategy="epoch"
,load_best_model_at_end=True
,fp16=True
,run_name="run1"
,gradient_accumulation_steps=20
)

I’m not sure which higher-level library you are using, but I would guess that the loss is calculated internally?
If so, you would need to check the internal implementation of this library.