Loss Function for Sequential Time Series Classification

I am playing around with the idea of doing a NLP type prediction for time series data. In my data, it often is the case that labels cluster together, so a ‘2’ label, will often times precede another ‘2’ label, and so forth. For that reason, I thought it would be interesting, instead of predicting each label in isolation, to try and predict a sequence of them where the output would have access to the recent labels, as is common in translation tasks and NLP transformers. The issue I’m having is knowing what loss function would work.

In my case, I have 3 classes I am trying to predict, and would like to predict each class 24 timesteps into the future. Therefore, the input to the loss function would be something like (256, 24, 3) => (batch, predicted sequence, logits).

In the PyTorch Cross entropy docs, it says the following: “The performance of this criterion is generally better when target contains class indices, as this allows for optimized computation.” I’m assuming that class indices means to not one-hot encode, and just keep them class indices. (in my case 0,1, or 2).

So that would mean that my ground truth matrix would be of shape (256, 24) or, to match dimensions (256, 24, 1). Where am I misunderstanding the function of Crossentropyloss? In order to do some tests, I ran the following code.

import torch
import torch.nn.functional as F

# Method 1
x = torch.randn(2,10,3) # (2 examples, 10 prediction length, 3 logits)
y = torch.tensor([[1,2,0,0,0,1,2,0,2,1],

# Method 2
x = torch.randn(10,3)
y = torch.tensor([1,0,0,0,2,2,2,3,1,0])

class Myloss(Module):
    def __init__(self):
        super(Myloss, self).__init__()
        self.loss_function = CrossEntropyLoss()

    def forward(self, y_pre, y_true):
        y_true = y_true.type(torch.float32).to(DEVICE)
        loss = self.loss_function(y_pre, y_true)

        return loss

loss = Myloss()

I’m also curious as to why the second method works, and the first doesn’t.

So, I just changed the first method to be


and it worked. I think I just misunderstood the docs. If anyone could verify this for me that would be awesome!

I apologize for the stream of consciousness, but I think I verified how the loss function works for multiple batches and dimensions. I’ve explicitly written out a few dimensions so that anyone else with this question can see the internals of the crossentropy loss function

x = torch.tensor([[[100,0,100,0,100,0,0,0,100,100],

y = torch.tensor([[0,1,0,2,0,1,1,1,0,0], [0,0,0,0,0,0,0,0,0,0]])

This will give a loss of 0. As you can see, we have an input of dimension (2,3,10) and targets of dimension (2,10). The first row in each of the 2 ‘x’ examples is the ‘0’ label, the second row is the ‘1’ label, and the third row is the ‘2’ label. The reason that I put 100, is that crossentropy loss does the softmaxing internally, so in order to get 0 loss, I input a large value that would cause the softmax function to put a 100% confidence in it’s place.

1 Like