Hi @dhruvbird , I think I don’t have data imbalance issue. The raw data only has positive class (clicks). I sampled the negative samples based on popularity with exactly 1:1 proportion.
After the negative sampling, I splitted the train and validation sets based on TimeSeriesSplit. The clicking data in validation set are always after the clicking behaviours in training data.
In training, I did shuffling on the training dataloader (shuffle
is True
by default for DistributedSampler
):
# Setting num_workers as 4 * num of GPUs
train_dataloader = DataLoader(
dataset_dict["train"], batch_size=batch_size, collate_fn=custom_collate_function, pin_memory=True,
num_workers=num_workers,
shuffle=False,
sampler=DistributedSampler(dataset_dict["train"])
)
valid_dataloader = DataLoader(
dataset_dict["valid"], batch_size=batch_size, collate_fn=custom_collate_function, pin_memory=True,
num_workers=num_workers,
shuffle=False,
sampler=DistributedSampler(dataset_dict["valid"], shuffle=False, drop_last=True) # 504114 % (64 * 4) == 50 samples
)
I think the 0.693 loss issue probably has something to do with the last fully connectly block with 4 linear layers and leakyReLU layers as mentioned by KFrank. Because when I replaced the whole fully connectly block with an inner product operation, there is no more 0.693 loss issue. But there is another problem, so I posted it separately: Does my loss curve show the model is overfitting?