Balancing time series data

So I’m working with time series data using a GRU and already explained a little bit about what I intend to do in this post.
But the short explaination is:
I have data on users, which is basically a dataframe for each user, with each row representing a users activity in that minute. And i want to predict an event might occur at some point.
Now i have around 1k samples for timelines that show the event. And around 5k samples of timelines where it doesn’t.
Now i would like to adjust for the imbalance in the dataset.
Usually i would do that by for example telling the CrossEntropyLoss weights for my classes.
But here i have a problem.
I get an output, that contains a prediction for every time step in my data.
So if my data looks like this:

Minute Feature_1 Feature_2 Feature_3 ... Target_0 Target_1
-4     0.4        0.23     0.64          1            0   
-3     0.24       0.23     0.64          1            0   
-2     0.34       0.1      0.64          1            0   
-1     0.56       0.2      0.64          1            0   
0      0.64       0.3      0.64          0            1  

With Target_0, Target_1Target_0 Target_1 being the labels.
My output will look like this (hopefully):

0.1    0.9   
0.1    0.9  
0.3    0.7
0.3    0.7
0.7    0.3

Now the time series are in differnt lengths and i have different amounts of them, in terms of: contains the event i’m looking for, or doesn’t.
What would be the best method of balancing my classes here?

Because i can’t just use CrossEntropyLoss as it for some reason does not support my dimensions.

Your example output seems to be a binary classification output, so you could use nn.BCELoss after applying sigmoid on your output.
However, to get more numerical stability I would recommend to let the model return raw logits and use nn.BCEWithLogits instead. This will also give you the ability to set the pos_weight to counter the effect of the imbalanced dataset.

1 Like

So with that, i have adjusted my code as follows

batch_size = 8
layer_size = 64
learning_rate = 5e-4
class_weights = torch.tensor([0.7, 0.3], dtype=torch.float64)
loss_fn  = nn.BCEWithLogitsLoss(reduction='none', pos_weight=class_weights)

class MY_FIRST_GRU(nn.Module):
    def __init__(self):
        super(MY_FIRST_GRU, self).__init__()
        self.gru = nn.GRU(input_size=32, 
                          hidden_size=20, 
                          num_layers=4,
                          batch_first=True)  # Note that "batch_first" is set to "True"
        self.l_out = nn.Linear(in_features=20*1,
                               out_features=2)
 
    def forward(self, batch):
        x, x_length, _ = batch
        x_pack = pack_padded_sequence(x, x_length, batch_first=True).float()
        packed_x, hidden = self.gru(x_pack)
        output_padded ,input_sizes = pad_packed_sequence(packed_x, batch_first=True)
        output = self.l_out(output_padded)
        return output

def train(train_loader):
    dl_model.train()
    total_loss = 0
    correct = 0
    for data_list in train_loader:
        # datalist:
        # 0 = data
        # 1 = data shape
        # 2 = label
        optimizer.zero_grad()
        output = dl_model(data_list) # shape = [8, 4, 2]
        y = data_list[2]
        lossmask = create_seq_mask(data_list[1], device)
        loss = (loss_fn(output, y.float()) * lossmask).sum() / lossmask.sum()
        loss.backward()
        total_loss += loss.item()
        with torch.no_grad():
            pred = output[0] 
        correct += pred.eq(y.long()).sum().item()
        optimizer.step()
    return total_loss / len(train_loader), correct / len(train_loader.dataset)


def test(loader):
    dl_model.eval()
    actuals = []
    probabilities = []
    correct = 0
    for data_list in test_loader:
        with torch.no_grad():
            output = dl_model(data_list)
            y = data_list[2]
            pred = output[0]
            actuals.extend(y.cpu().detach().numpy())
        correct += pred.eq(y.long()).sum().item()
    return correct / len(loader.dataset), actuals 

And as far as i can tell, the loss looks like it’s minimising in a way that seams reasonable.
But i should still adjust my accuracy calculations.
But from your perspective, does this look ok (except for the acc calculation)?

The code looks generally alright.
However, the pos_weight should be given as a scalar tensor e.g. as nb_negative_samples/nb_positive_samples as seen in the docs.