Hi !

I’m new to NLP and transformers. I have readen documentation on this (BERT, Attention is All you need, …).

I’m trying to implement a Bert model that I want to fine tune which compute two different classifications :

- first one : 2 classes
- second classifier : 5 classes

I have a my labels as :

```
\y :
tensor([[[[1],
[0]]],
[[[1],
[0]]]])
```

The outputs after model(x) :

```
Output :
[tensor([[ 0.1207, -0.2359],
[ 0.4030, -0.0475]], grad_fn=<AddmmBackward>), tensor([[ 0.1071, 0.1679, 0.0090, -0.7056, -0.1793],
[ 0.1295, -0.0781, 0.3041, -0.6385, 0.1090]],
grad_fn=<AddmmBackward>)]
```

As you can see, I have 2 classifier of size 2 and 5. and the labels corresponds to the index of each classifier output.

If I apply criterion as :

```
for i in range(2) : #for each label
loss+= criterion(output[i],y[i])
```

For i = 0, I print these two output[0] and y[0] :

```
output[i] :
tensor([[ 0.4083, -0.1396],
[ 0.4059, -0.3079]], grad_fn=<AddmmBackward>)
y[i] :
tensor([[[1],
[1]]])
```

I got the error :

`ValueError: Expected input batch_size (2) to match target batch_size (1).`

I’m stuck on this and I have the impression of forbid something about y format but I’m not sure. Someone could help me on this one ? Thanks you