I’m converting someone else’s (pytorch) Google Colab jupyter notebook to straight python (3.8.8) that I can run locally.
When I use the original code as is, making a multi-process Dataloader, locally:
dataloaders = {‘train’: torch.utils.data.DataLoader(image_datasets[‘train’], batch_size=batch_size, shuffle=True, num_workers=5),
‘valid’: torch.utils.data.DataLoader(image_datasets[‘valid’], batch_size=batch_size, shuffle=True, num_workers=5)}
it crashes with:
→ for batch_idx, (images, labels) in enumerate(dataloaders[‘valid’]):
(Pdb) n
Traceback (most recent call last):
File “”, line 1, in
File “c:\users\MyId\appdata\local\programs\python\python38\lib\multiprocessing\spawn.py”, line 117, in spawn_main
exitcode = _main(fd, parent_sentinel)
File “c:\users\MyId\appdata\local\programs\python\python38\lib\multiprocessing\spawn.py”, line 127, in _main
self = reduction.pickle.load(from_parent)
AttributeError: Can’t get attribute ‘ImageFolderWithPaths’ on <module ‘main’ (built-in)>
Traceback (most recent call last):
Something to do with the calling context getting unpickled (unserialized).
If I change the code to single process:
dataloaders = {‘train’: torch.utils.data.DataLoader(image_datasets[‘train’]),
‘valid’: torch.utils.data.DataLoader(image_datasets[‘valid’])}
and run it locally, it runs fine.
And yet the multi-process code runs one on Google Colab.
Why? And I wonder what I need to change to get multi-process to run locally.