Compatibility of subset dataset with disabled batch sampling?

I think there is a competability issue with disabled batch sampling and subset dataset
The use-case - define custom batch sampling, and split the dataset using pytorch split utility function
Here’s a minimal working example

self.train_dataset, self.val_dataset, self.test_dataset =
            self.dataset, [100, 100, 100])

loader = DataLoader(
                    SequentialSampler(dataset), batch_size=self.hparams.batch_size, drop_last=False),

And when iterating the subset datasets this is the error

Exception has occurred: TypeError
list indices must be integers or slices, not list
  File "/path/utils/data/", line 257, in __getitem__
    return self.dataset[self.indices[idx]]
  File "/path/utils/data/_utils/", line 46, in fetch
    data = self.dataset[possibly_batched_index]
  File "/path/utils/data/", line 385, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/path/utils/data/", line 345, in __next__
    data = self._next_data()
  File "/path/lib/python3.7/site-packages/pytorch_lightning/trainer/", line 251, in _evaluate
    for batch_idx, batch in enumerate(dataloader):
  File "/path/lib/python3.7/site-packages/pytorch_lightning/trainer/", line 843, in run_pretrain_routine
  File "/path/lib/python3.7/site-packages/pytorch_lightning/trainer/", line 477, in single_gpu_train
  File "/path/lib/python3.7/site-packages/pytorch_lightning/trainer/", line 704, in fit
  File "/path/", line 152, in main_train
  File "/path/", line 66, in main
    main_train(model_class_pointer, hyperparams, logger)
  File "/path/", line 161, in <module>
  File "/path/lib/python3.7/", line 85, in _run_code
    exec(code, run_globals)
  File "/path/lib/python3.7/", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/path/lib/python3.7/", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "/path/lib/python3.7/", line 85, in _run_code
    exec(code, run_globals)
  File "/path/lib/python3.7/", line 193, in _run_module_as_main
    "__main__", mod_spec)

As the self.indices of the subset object is a simple python list.

1 Like

I can reproduce it using the code sample.
Could you please create an issue to track this bug?

As a workaround you could use a SubsetRandomSampler and pass the shuffled indices to it.
Inside the Dataset.__getitem__ you might need to create a single index tensor via:

index = torch.stack(index)
x =[index]

since the SubsetRandomSampler will pass a list of tensors to the Dataset for the BatchSampler approach.

1 Like

I ended creating a sequential subset sampler, as my dataset is already randomized, and it is crucial (caching) to fetch samples sequentially.