Unable to load new dataset on DataLoader

Neofytos · June 6, 2018, 2:01pm

Hi everyone,

I’ve been trying to load a new dataset using DataLoader but I get the following error:

Traceback (most recent call last):
  File "C:/Users/User/PycharmProjects/STN_LSTM/STN_LSTM.py", line 98, in <module>
    main()
  File "C:/Users/User/PycharmProjects/STN_LSTM/STN_LSTM.py", line 83, in main
    for i_batch, sample_batched in enumerate(train_loader):
  File "C:\Users\User\Anaconda3\envs\tensorflow\lib\site-packages\torch\utils\data\dataloader.py", line 451, in __iter__
    return _DataLoaderIter(self)
  File "C:\Users\User\Anaconda3\envs\tensorflow\lib\site-packages\torch\utils\data\dataloader.py", line 239, in __init__
    w.start()
  File "C:\Users\User\Anaconda3\envs\tensorflow\lib\multiprocessing\process.py", line 105, in start
    self._popen = self._Popen(self)
  File "C:\Users\User\Anaconda3\envs\tensorflow\lib\multiprocessing\context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Users\User\Anaconda3\envs\tensorflow\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)
  File "C:\Users\User\Anaconda3\envs\tensorflow\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
    reduction.dump(process_obj, to_child)
  File "C:\Users\User\Anaconda3\envs\tensorflow\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
TypeError: cannot serialize '_io.BufferedReader' object

I’ve followed the data loading tutorial (https://pytorch.org/tutorials/beginner/data_loading_tutorial.html) and have successfully created a dataset class inheriting from Dataset that overrides len and getitem so that they return the size of the dataset and an indexed sample respectively. I can also iterate and print samples from this dataset.

However, once I run the code below that error pops up.

    train_loader = DataLoader(mnist_clutterred, batch_size=64,
                            shuffle=True, num_workers=4)
    print(len(train_loader))
    for i_batch, sample_batched in enumerate(train_loader):
        print(i_batch)

Any ideas on why I get that error?
Thanks in advance!

Neofytos · June 6, 2018, 2:20pm

I’ve found the error: I am using the “cpu” (i.e. device = torch.device(“cpu”)) and at the same time I am defining workers.

Removing “num_workers=4” lifted the error