Opening .pkl in a Dataset used with a Dataloader with multiple workers causes issues

I am having an issue in Ubuntu and Windows where when after I declare a dataset object and load a dict from a pickle as part of the init function, that when I declare a Dataloader object with more than 0 workers, the program stalls indefinitely. Something like:

import pickle
from import Dataset

def open_pkl(pkl_fp):
     with open(pkl_fp, "rb") as f:
          d = pickle.load(f)
    return d

class Sample_Dataset(Dataset):
    __init__(self, pkl_fp):
 = open_pkl(pkl_fp)
          return len(list(

          return self

    __getitem__(self, index):

if __name__ == "__main__":
     fp = "/data.pkl"
     dl =, num_workers=8, batch_size=8)
     for data in dl:
          debug = "debug"

Is there any way around this such that each worker can have access to the data without this indefinite stall? I know that in my particular use case that when I use num_workers=0 the code works as intended.

Could you check how much shared memory your system has? Maybe the use case is hanging while trying to copy the numpy array to shared memory.

james@james-System-Product-Name:~$ ipcs -m
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x00000000 327686 james 600 4194304 2 dest
0x00000000 491527 james 600 524288 2 dest
0x00000000 491528 james 600 524288 2 dest
0x00000000 1179660 james 600 53248 2 dest
0x00000000 1179661 james 600 53248 2 dest
0x00000000 1376277 james 600 56808 2 dest
0x00000000 28 james 600 524288 2 dest
0x00000000 294949 james 600 524288 2 dest
0x00000000 294953 james 600 4194304 2 dest
0x00000000 491563 james 600 524288 2 dest
0x00000000 327726 james 600 524288 2 dest
0x00000000 458802 james 600 524288 2 dest
0x00000000 458809 james 600 524288 2 dest

james@james-System-Product-Name:~$ ipcs -l
------ Messages Limits --------
max queues system wide = 32000
max size of message (bytes) = 8192
default max size of queue (bytes) = 16384
------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 18014398509465599
max total shared memory (kbytes) = 18014398509481980
min seg size (bytes) = 1
------ Semaphore Limits --------
max number of arrays = 32000
max semaphores per array = 32000
max semaphores system wide = 1024000000
max ops per semop call = 500
semaphore max value = 32767

I guess the alternative is lazy loading if this is the issue (the pkl is probably like between 700mb to 1GB), but it would be nice for each process to get its own copy, as I have 64 GB of ram.

Actually @ptrblck it seems my num_workers is just generally not working when the value is larger than 0. This code block stalls:

from import Dataset
import torch

class dts(Dataset):
    def __init__(self, dataset_size=10):
        self.dataset_size = dataset_size = torch.rand(dataset_size, 3, 600, 600)

    def __len__(self):
        return self.dataset_size

    def __getitem__(self, idx):

if __name__ == "__main__":
    dl =, num_workers=0, batch_size=2)
    for data in dl:
        debug = "debug"

My specs are:
OS: Ubuntu 20.04
GPUS: 3 Nvidia A4000
Python Version: 3.7
PyTorch Version: 11.4 nightly
Cuda Version: 11.7
GPU Driver: NVIDIA-SMI 520.56.06 Driver Version: 520.56.06 CUDA Version: 11.

I will post this on Stack Overflow and the GitHub there seems to be a general issue here assuming my code is ok.

Your code looks alright and I don’t see any issue with it on my machine.
Note that the code also doesn’t print anything so I’m unsure how you are verifying what exactly is working.

Well the only thing I notice is that it stalls indefinitely in the for loop in the last two lines. Not sure what else to do really.

@ptrblck as I noted in the GitHub this issue was caused by PyCharm in debug mode, nothing to do with PyTorch, apologies.

1 Like

Good to hear you’ve narrowed down the issue!

1 Like