I am having an issue in Ubuntu and Windows where when after I declare a dataset object and load a dict from a pickle as part of the init function, that when I declare a Dataloader object with more than 0 workers, the program stalls indefinitely. Something like:
import pickle
from torch.utils.data import Dataset
def open_pkl(pkl_fp):
with open(pkl_fp, "rb") as f:
d = pickle.load(f)
return d
class Sample_Dataset(Dataset):
__init__(self, pkl_fp):
super().__init__()
self.data = open_pkl(pkl_fp)
__len__(self):
return len(list(self.data.keys()))
__iter__(self):
return self
__getitem__(self, index):
return self.data[index]
if __name__ == "__main__":
fp = "/data.pkl"
dl = torch.utils.data.Dataloader(dataset=Sample_Dataset(fp), num_workers=8, batch_size=8)
for data in dl:
debug = "debug"
Is there any way around this such that each worker can have access to the data without this indefinite stall? I know that in my particular use case that when I use num_workers=0 the code works as intended.
james@james-System-Product-Name:~$ ipcs -m
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x00000000 327686 james 600 4194304 2 dest
0x00000000 491527 james 600 524288 2 dest
0x00000000 491528 james 600 524288 2 dest
0x00000000 1179660 james 600 53248 2 dest
0x00000000 1179661 james 600 53248 2 dest
0x00000000 1376277 james 600 56808 2 dest
0x00000000 28 james 600 524288 2 dest
0x00000000 294949 james 600 524288 2 dest
0x00000000 294953 james 600 4194304 2 dest
0x00000000 491563 james 600 524288 2 dest
0x00000000 327726 james 600 524288 2 dest
0x00000000 458802 james 600 524288 2 dest
0x00000000 458809 james 600 524288 2 dest
james@james-System-Product-Name:~$ ipcs -l
------ Messages Limits --------
max queues system wide = 32000
max size of message (bytes) = 8192
default max size of queue (bytes) = 16384
------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 18014398509465599
max total shared memory (kbytes) = 18014398509481980
min seg size (bytes) = 1
------ Semaphore Limits --------
max number of arrays = 32000
max semaphores per array = 32000
max semaphores system wide = 1024000000
max ops per semop call = 500
semaphore max value = 32767
I guess the alternative is lazy loading if this is the issue (the pkl is probably like between 700mb to 1GB), but it would be nice for each process to get its own copy, as I have 64 GB of ram.
Your code looks alright and I don’t see any issue with it on my machine.
Note that the code also doesn’t print anything so I’m unsure how you are verifying what exactly is working.