CrossEntropyLoss target size

Based on the posted shapes your output has the shape [batch_size=100, nb_classes=28, seq_len=4], which requires the target to have the shape [batch_size=100, seq_len=4].
However, based on your model architecture I assume you want to use 4 classes in the prediction.
Since your output is 3-dimensional: are you using a sequence of samples or is the input reshaped in a wrong way?