(Multiprocessing) process hangs while looping through dataloader

I’ve been trying to set up parallelisation for an object detection model I’ve trained, in order to improve the throughput of the model when running on CPU. To do this, I’m roughly following this blog post on implementing Hogwild in PT.

Unfortunately, when running my script, the processes appear to hang while trying to iterate through the DataLoader. Iterating through the DataLoader before calling mp.Process works as expected, but iterating within the process causes the program to freeze.

I’ve provided a minimal example below:

import torch
import torch.multiprocessing as mp
from torch.utils.data import TensorDataset, DataLoader
from torch.utils.data.distributed import DistributedSampler

def loop_thru(dataloader):
    for i, _ in enumerate(dataloader):

if __name__ == "__main__":
    # Settings
    num_processes = 1

    # Create data
    data = torch.ones((1024,3,352,352)).float() / 255
    dataset = TensorDataset(data)

    # Start processes to loop through data
    processes = []
    for rank in range(num_processes):
        sampler = DistributedSampler(
            dataset=dataset, num_replicas=num_processes, rank=rank, shuffle=False
        dataloader = DataLoader(

        p = mp.Process(target=loop_thru, args=(dataloader,))

    for p in processes:
        print(f"Joining: {p}")

When I run this script, the outputs are as follows before the program hangs:

Any idea what I’m doing wrong here?

Edit: Running pytorch=1.10.1

How long are you waiting? It ran just fine for me… produced 0-63 on separate lines.

Just tried it again, 5 minutes and still nothing.

May be worth mentioning that this is running within a Docker container, though I’m not sure that should affect anything.

Update: Changed num_workers to 1, and the dataloader now iterates correctly. The process still hangs when running the data through an actual model, but the original problem is resolved.

I also encountered this problem. Running on jupyter in VSCode, when num_worker is specified, the jupyter kernel will crash at the end of the DataLoader loop.
I solved the problem temporarily by not setting num_worker, but I still prefer to be able to load data in multiple threads

You can place you Dataset and DataLoader creation code into your loop_fn