erroneus
(Devon)
November 9, 2017, 7:25pm
1
Often it’s nice to design your code to kill cleanly without traceback with a KeyboardInterrupt. However, it seems like when using multiple workers and the data loader, KeyboardInterrupt doesn’t get caught correctly with a wrapping try/except (this is a known problem with multiprocessing). Is there any work-around for this?
Thanks
1 Like
Edit:
The patch below doesn’t seem to handle KeyboardInterrupts. Nevermind my old post.
Old post:
There’s a timeout option for DataLoader on the master branch, but you’ll have to install master from source until PyTorch updates the binaries.
num_workers (int, optional): how many subprocesses to use for data
loading. 0 means that the data will be loaded in the main process.
(default: 0)
collate_fn (callable, optional): merges a list of samples to form a mini-batch.
pin_memory (bool, optional): If ``True``, the data loader will copy tensors
into CUDA pinned memory before returning them.
drop_last (bool, optional): set to ``True`` to drop the last incomplete batch,
if the dataset size is not divisible by the batch size. If ``False`` and
the size of dataset is not divisible by the batch size, then the last batch
will be smaller. (default: False)
timeout (numeric, optional): if positive, the timeout value for collecting a batch
from workers. Should always be non-negative. (default: 0)
worker_init_fn (callable, optional): If not None, this will be called on each
worker subprocess with the worker id as input, after seeding and before data
loading. (default: None)
.. note:: By default, each worker will have its PyTorch seed set to
``base_seed + worker_id``, where ``base_seed`` is a long generated
by main process using its RNG. You may use ``torch.initial_seed()`` to access
this value in :attr:`worker_init_fn`, which can be used to set other seeds
(e.g. NumPy) before data loading.
I’ve figured out a partial hack for this which works slightly better, but not perfectly. When initializing the DataLoader:
def worker_init(x):
signal.signal(signal.SIGINT, signal.SIG_IGN)
loader = DataLoader(dataset, batch_size=batch_size, shuffle=shuffle, num_workers=n_workers,
worker_init_fn=worker_init)
Instead of exploding the terminal, I get:
Exception ignored in: <bound method _DataLoaderIter.__del__ of <torch.utils.data.dataloader._DataLoaderIter object at 0x7fcb5c36c358>>
Traceback (most recent call last):
...
TypeError: 'NoneType' object is not callable
So: better, but not perfect.
2 Likes
Thanks for this wonderful workaround! Worked like a charm. I didn’t even get the Exception ignored message.
The Exception ignored
is probably due to that I’m nesting some KeyboardInterrupt exception catching.
I wouldn’t be surprised if there are some unintended side-effects to this, but so far it seems to work well. I’m worried though that the workers might not be terminating correctly. Perhaps someone more familiar with multiprocessing can help.