Model inference inside custom dataset class


As part of my code, my model needs features of another pre-trained model (call it feature-net) as input. To this end, I fine-tuned feature-net, and now I want to do the inference inside my custom dataset class. Inside the custom dataset class, I get some images, do the inference, and return extracted features in getitem() function. However, when I use my custom dataset, I receive this error:

Traceback (most recent call last):
  File "", line 133, in <module>
    model  = train_model(model, criterion, optimizer_ft, exp_lr_scheduler, dataloaders, use_gpu, dataset_sizes, num_epochs=25)
  File "", line 46, in train_model
    for data in dataloaders[phase]:
  File "some/dir/.local/lib/python2.7/site-packages/torch/utils/data/", line 281, in __next__
    return self._process_next_batch(batch)
  File "some/dir/.local/lib/python2.7/site-packages/torch/utils/data/", line 301, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
RuntimeError: Traceback (most recent call last):
  File "some/dir/.local/lib/python2.7/site-packages/torch/utils/data/", line 55, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "some/dir/pytorch/pytorch_example/", line 156, in __getitem__
    features = extract_ResNet18_feature(self.feature_extractor, Variable(frames.cuda()))
  File "some/dir/.local/lib/python2.7/site-packages/torch/", line 69, in _cuda
    return new_type(self.size()).copy_(self, async)
  File "some/dir/.local/lib/python2.7/site-packages/torch/cuda/", line 384, in _lazy_new
  File "some/dir/.local/lib/python2.7/site-packages/torch/cuda/", line 140, in _lazy_init
    "Cannot re-initialize CUDA in forked subprocess. " + msg)
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use Python 3.4+ and the 'spawn' start method

BTW, I should use Python 2.x. Does anyone have an idea how to address this?


Did you use num_workers > 0 and did you use your model on the GPU?
If so, it might be a bit problematic, since the DataLoader wants to use multi-processing, which apparently clashes with the CUDA calls.

With num_workers=0 it’ll work, but I’m not sure, if it will be the bottleneck in your application, since the main thread will be used to load the data.

Another approach would be to store all processed features as a Tensor beforehand and then just load it in a Dataset.

Thank you very much for your reply. Setting num_workers=0 solved the problem. I see if it becomes too slow, I will stored the computed features as Tensor.


@Haydnspass proposal: you might alternatively make the feature extractor part of your Module, rather than calling it in the Dataset or pre-computing it