Dataloader, parallelism, SSD, and hanging long

Hi, my profiler returns the following result for the training loop:

There are two problematic things:

  1. a method in popen_spawn_posix.py that bottlenecks training
  2. a __del__ method in dataloader.py that takes suspiciously long

The problem applies only when I use num_workers > 0. My data resides on an SSD drive.

Remarks about the code:

  • The dataset’s __getitem__ loads a single .npy file ~3MB in size.
  • The dataset’s __init__ loads a single .json file ~40MB in size
  • dataloader:
torch.utils.data.DataLoader(
                dataset,
                batch_size=4,
                shuffle=False,
                sampler=None,
                num_workers=4,
                collate_fn=custom_collate_fn,
                drop_last=True,
                persistent_workers=True,
            )

Also, having persistent_workers=True makes we wonder why the next bottleneck is __del__ from dataloader.py.
Thanks for your help.

Without loading the 40MB file in __init__ it gets faster, but still substantially slower than num_workers=0