Hi Pytorch community,
I have a held-one-patient-out experiment. I have designed a CNN+RNN net. During training, since I have an imbalanced dataset I used the sampler within the DataLoader.
Now I am adding some labeled samples (e.g 10) to my training set from the target subject with the objective of improving the performance of my task. Although I would expect this to be beneficial for my model, in some situations I see the performance drops down. So my question is, how can I be sure the DataLoader is using also the added target samples? I would expect adding samples for the target subject would be at least as good as not adding them, but never worse. Any help would be more than appreciated.
I train my model from scratch. I stop the training at 20 epochs. I use BatchNorm with the PyTorch momentum param equal to 0.01
This is a sniped of my code.
import torch
torch.manual_seed(0) # the same seed, to ensure the same weight init
train_df = pd.DataFrame()
train_df = pd.concat([train_df, train_df_aux, seeds])
# train_df_aux is the training data without any labelled samples from the target subject
# seeds are the labelled samples from the target subject, this is also an umbalaced subset, since I get the labelled samples untill X positive class are found.
train_df.reset_index(drop=True, inplace=True)
# DATA LOADERS
train_data = torch.utils.data.ConcatDataset([train_data_ori, train_data_trf1])
# train_data_trf1 is the train data with augmentation strategies.
sampler = torch.utils.data.sampler.WeightedRandomSampler(weights, len(weights))
#weights is constructed using cutomized fuctions to have during training balanced batched.
# During training
kwargs = {'num_workers': hparams["num_workers"], 'pin_memory': True} if use_cuda else {}
train_loader = DataLoader(train_data, batch_size=hparams["batch_size"],
sampler=sampler,
**kwargs)
Thank you in advance!