Why do we use threading.Thread and multiprocessing.Process in dataloader.py

I am reading the code of dataloader.py.
PyTorch adopts threading.Thread to manage _pin_memory,. pytorch/dataloader.py at master · pytorch/pytorch · GitHub; However, it leverages mutliprocessing.Process to manage dataloader worker pytorch/dataloader.py at master · pytorch/pytorch · GitHub.

Any considerations about this ?

Does this section answer your question?

# `pin_memory_thread_done_event`:
    #   A `threading.Event` for a similar purpose to that of
    #   `workers_done_event`, but is for the `pin_memory_thread`. The reason
    #   that separate events are needed is that `pin_memory_thread` reads from
    #   the output queue of the workers. But the workers, upon seeing that
    #   `workers_done_event` is set, only wants to see the final `None`, and is
    #   not required to flush all data in the output queue (e.g., it may call
    #   `cancel_join_thread` on that queue if its `IterableDataset` iterator
    #   happens to exhaust coincidentally, which is out of the control of the
    #   main process). Thus, since we will exit `pin_memory_thread` before the
    #   workers (see below), two separete events are used.
1 Like