Input/Target size mismatch when training a downstream BERT for classification (huggingface pretrained)

I am training a BERT model with a downstream task to classify movie genres. I am using HuggingFace pretrained model (aleph-bert since data is in Hebrew)

When training, I get the following error:

ValueError: Expected input batch_size (3744) to match target batch_size (16).

This is my notebook: Google Colab

The error happens in the compute_loss functions, while performing the cross_entropy step.

My batch size is 16 but for some reason the bert output returns a different size.