Hi,

I am trying to implement token level tagging model. I have two sentences with 14 and 9 words. I’ve padded the shorter one with `utils.rnn.pad_sequence`

function and I obtained log probabilities (with `nn.LogSoftmax()`

) for each word:

`pred_logits = self.log_softmax( feats)`

`# print(pred_logits.shape) => 2x14x2 (bs x max_token_num x size_of_label_set)`

my target array is:

`target = torch.tensor([[1,0,1,1,0,1,1,0,1,0,1,1,0,1],[1,1,1,0,1,1,1,0,1,1,1,1,1,1]])`

`mask = torch.tensor([[1,1,1,1,1,1,1,1,1,1,1,1,1,1],[1,1,1,1,1,1,1,1,1,0,0,0,0,0]])`

Now I want to calculate the loss by using pred_logits and target tensors as follows:

```
loss = self.nll_loss(pred_logits, target)
# average/reduce the loss according to the actual number of of predictions (i.e. one prediction per token).
loss /= mask.float().sum()
return loss
```

however, I am getting

ValueError: Expected target size (2, 2), got torch.Size([2, 14]) error.

How could I fix this problem ?

EDIT

I can do it as follows without getting any error but I wonder if there is a better way to do the same thing:

```
loss = 0
for i in range(0,target.shape[0]):
loss_tmp = self.nll_loss(pred_logits[i,:,:], target[i,:])
loss += loss_tmp
loss /= mask.float().sum()
return loss
```