Help! adding data of the target subject to training performs worse

vpeterson · January 9, 2022, 4:52pm

Hi Pytorch community,

I have a held-one-patient-out experiment. I have designed a CNN+RNN net. During training, since I have an imbalanced dataset I used the sampler within the DataLoader.
Now I am adding some labeled samples (e.g 10) to my training set from the target subject with the objective of improving the performance of my task. Although I would expect this to be beneficial for my model, in some situations I see the performance drops down. So my question is, how can I be sure the DataLoader is using also the added target samples? I would expect adding samples for the target subject would be at least as good as not adding them, but never worse. Any help would be more than appreciated.

I train my model from scratch. I stop the training at 20 epochs. I use BatchNorm with the PyTorch momentum param equal to 0.01

This is a sniped of my code.

import torch
torch.manual_seed(0) # the same seed, to ensure the same weight init

  
train_df = pd.DataFrame()
train_df = pd.concat([train_df, train_df_aux, seeds]) 
# train_df_aux is the training data without any labelled samples from the target subject
# seeds are the labelled samples from the target subject, this is also an umbalaced subset, since I get the labelled samples untill X positive class are found.
train_df.reset_index(drop=True, inplace=True)
  
   
# DATA LOADERS
train_data = torch.utils.data.ConcatDataset([train_data_ori, train_data_trf1])
# train_data_trf1 is the train data with augmentation strategies.
            
sampler = torch.utils.data.sampler.WeightedRandomSampler(weights, len(weights))
#weights is constructed using cutomized fuctions to have during training balanced batched. 

# During training
kwargs = {'num_workers': hparams["num_workers"], 'pin_memory': True} if use_cuda else {}
train_loader = DataLoader(train_data, batch_size=hparams["batch_size"],
                          sampler=sampler,
                              **kwargs)

Thank you in advance!

nivek · January 11, 2022, 9:59pm

If you just iterate through DataLoader and print out each sample, do you see the label samples that you added?

vpeterson · January 12, 2022, 3:43pm

Hi nivek! thank you for your reply. Yes, I have checked, and I see the added samples!