RNN many-to-many classification with cross entropy loss

Hello everyone,
I have a short question regarding RNN and CrossEntropyLoss:
I want to classify every time step of a sequence. For this I want to use a many-to-many classification with RNN. So I forward my data (batch x seq_len x classes) through my RNN and take every output. My target is already in the form of (batch x seq_len) with the class index as entry.
Now I use the CrossEntropyLoss to train my net and that’s the point I’m not sure about my solution.
The CrossEntropyLoss wants the output as (NxC) and the target as (N). So what I do is reshaping the output to (batchseq_len x C) and the target to (batchseq_len).
My minimal working example looks like this:

import torch

# 3 batches, sequence length 10, 5 features to classify into 3 classes
data = torch.rand((3, 10, 5))  # (batch x seq_len x features)
target = torch.randint(0, 3, (3, 10))  # (batch x seq_len), class index for each timestep

model = torch.nn.RNN(input_size=5, hidden_size=3, batch_first=True)
output, _ = model.forward(data)  # (batch x seq_len x number_of_classes)

# reshape output and target for cross entropy loss
output = output.reshape(output.size(0)*output.size(1), -1)  # (batch * seq_len x classes)
target = target.reshape(-1)  # (batch * seq_len), class index

criterion = torch.nn.CrossEntropyLoss()
loss = criterion(output, target)

Is this correct? Is the reshape() doing what I want to do?

Thanks for your help!

This would be the expected shapes for the “standard” multi-class classification use case, but as described in the docs the expected tensor can have additional dimensions (d1, d2, ... , dk).
For your use case you could thus use the model output as [batch_size, nb_classes, seq_len] and the target as [batch_size, seq_len].

3 Likes

Hi @ptrblck,

Good day.
May I know is it possible for sequence length for both data and target are different?
For example, seq for data is 60 and seq for target is 5.

Thanks

It could be possible to use them, but it depends on the actual use case.
I.e. in case the model outputs a tensor with a different sequence length than the target tensor, how would the loss be calculated? If your criterion handles inputs of different lengths, then it would work.