Error with "ImbalancedDatasetSampler"

swap · September 11, 2020, 1:27pm

Hi,
I am dealing with imbalanced data (mere 2% minority samples). I tried “WeightedRandomSampler” approach which only works OK for my validation set, but it fails in case of independent test set. I came across https://github.com/ufoym/imbalanced-dataset-sampler and I wanted to try this approach on my data. The problem is - this ‘ImbalancedDatasetSampler’ module can’t figure out labels in my TensorDataset object.

train_loader = data_utils.DataLoader(train_dataset, batch_size = BATCH_SIZE, sampler=ImbalancedDatasetSampler(train_dataset))

it returns error

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-13-b7cc711025fa> in <module>
      2 train_loader = data_utils.DataLoader(train_dataset, 
      3                                      batch_size = BATCH_SIZE,
----> 4                                     sampler=ImbalancedDatasetSampler(train_dataset))

~/py_torch_sampler.py in __init__(self, dataset, indices, num_samples, callback_get_label)
     30         label_to_count = {}
     31         for idx in self.indices:
---> 32             label = self._get_label(dataset, idx)
     33             if label in label_to_count:
     34                 label_to_count[label] += 1

~/py_torch_sampler.py in _get_label(self, dataset, idx)
     51             return self.callback_get_label(dataset, idx)
     52         else:
---> 53             raise NotImplementedError
     54 
     55     def __iter__(self):

NotImplementedError:

Could someone tell me how can I solve this problem?

ptrblck · September 14, 2020, 9:01am

Based on the provided stack trace it seems the NotImplementedError is raised by the ImbalancedDatasetSampler, so I would recommend to create an issue in the project’s GitHub repository for better visibility.