Dataloader, parallelism, SSD, and hanging long

kzaitse · June 9, 2024, 5:53pm

Hi, my profiler returns the following result for the training loop:

There are two problematic things:

a method in popen_spawn_posix.py that bottlenecks training
a __del__ method in dataloader.py that takes suspiciously long

The problem applies only when I use num_workers > 0. My data resides on an SSD drive.

Remarks about the code:

The dataset’s __getitem__ loads a single .npy file ~3MB in size.
The dataset’s __init__ loads a single .json file ~40MB in size
dataloader:

torch.utils.data.DataLoader(
                dataset,
                batch_size=4,
                shuffle=False,
                sampler=None,
                num_workers=4,
                collate_fn=custom_collate_fn,
                drop_last=True,
                persistent_workers=True,
            )

Also, having persistent_workers=True makes we wonder why the next bottleneck is __del__ from dataloader.py.
Thanks for your help.

kzaitse · June 11, 2024, 11:41am

Without loading the 40MB file in __init__ it gets faster, but still substantially slower than num_workers=0