I wrote a pretty involved custom DataLoader class. It works fine if I only use “0” workers, but as soon as I try to have 1+ workers, I run into issues.
In my __getitem__(self, index)
method, in addition to returning the data and labels, I also want to cache intermediate results, so that they can be reused in the future. For this purpose I have a dictionary self.cache[index]
where I store these results.
The puzzling thing is not that two workers might try to read/write the same dictionary entry simultaneously. Rather, self.cache
doesn’t seem to be shared by workers at all. For example, if I tried to access the field from the training script (i.e. dataset.cache[index]
) there is nothing there.
Is there a detailed guide/documentation on how to create a workaround for this? I’m obviously new to this …