So I’m working with time series data using a GRU and already explained a little bit about what I intend to do in this post.
But the short explaination is:
I have data on users, which is basically a dataframe for each user, with each row representing a users activity in that minute. And i want to predict an event might occur at some point.
Now i have around 1k samples for timelines that show the event. And around 5k samples of timelines where it doesn’t.
Now i would like to adjust for the imbalance in the dataset.
Usually i would do that by for example telling the CrossEntropyLoss weights for my classes.
But here i have a problem.
I get an output, that contains a prediction for every time step in my data.
So if my data looks like this:
With Target_0, Target_1Target_0 Target_1 being the labels.
My output will look like this (hopefully):
0.1 0.9
0.1 0.9
0.3 0.7
0.3 0.7
0.7 0.3
Now the time series are in differnt lengths and i have different amounts of them, in terms of: contains the event i’m looking for, or doesn’t.
What would be the best method of balancing my classes here?
Because i can’t just use CrossEntropyLoss as it for some reason does not support my dimensions.
Your example output seems to be a binary classification output, so you could use nn.BCELoss after applying sigmoid on your output.
However, to get more numerical stability I would recommend to let the model return raw logits and use nn.BCEWithLogits instead. This will also give you the ability to set the pos_weight to counter the effect of the imbalanced dataset.
batch_size = 8
layer_size = 64
learning_rate = 5e-4
class_weights = torch.tensor([0.7, 0.3], dtype=torch.float64)
loss_fn = nn.BCEWithLogitsLoss(reduction='none', pos_weight=class_weights)
class MY_FIRST_GRU(nn.Module):
def __init__(self):
super(MY_FIRST_GRU, self).__init__()
self.gru = nn.GRU(input_size=32,
hidden_size=20,
num_layers=4,
batch_first=True) # Note that "batch_first" is set to "True"
self.l_out = nn.Linear(in_features=20*1,
out_features=2)
def forward(self, batch):
x, x_length, _ = batch
x_pack = pack_padded_sequence(x, x_length, batch_first=True).float()
packed_x, hidden = self.gru(x_pack)
output_padded ,input_sizes = pad_packed_sequence(packed_x, batch_first=True)
output = self.l_out(output_padded)
return output
def train(train_loader):
dl_model.train()
total_loss = 0
correct = 0
for data_list in train_loader:
# datalist:
# 0 = data
# 1 = data shape
# 2 = label
optimizer.zero_grad()
output = dl_model(data_list) # shape = [8, 4, 2]
y = data_list[2]
lossmask = create_seq_mask(data_list[1], device)
loss = (loss_fn(output, y.float()) * lossmask).sum() / lossmask.sum()
loss.backward()
total_loss += loss.item()
with torch.no_grad():
pred = output[0]
correct += pred.eq(y.long()).sum().item()
optimizer.step()
return total_loss / len(train_loader), correct / len(train_loader.dataset)
def test(loader):
dl_model.eval()
actuals = []
probabilities = []
correct = 0
for data_list in test_loader:
with torch.no_grad():
output = dl_model(data_list)
y = data_list[2]
pred = output[0]
actuals.extend(y.cpu().detach().numpy())
correct += pred.eq(y.long()).sum().item()
return correct / len(loader.dataset), actuals
And as far as i can tell, the loss looks like it’s minimising in a way that seams reasonable.
But i should still adjust my accuracy calculations.
But from your perspective, does this look ok (except for the acc calculation)?
The code looks generally alright.
However, the pos_weight should be given as a scalar tensor e.g. as nb_negative_samples/nb_positive_samples as seen in the docs.