How to share state between dataloader workers?

I’m using torchdata along with Pytorch dataloaders.

I want to read files from a single LMDB database. I suspect I will need to share a single database handle amongst all the dataloader workers, since I’m not sure if LMDB supports multi-process reading.

How can I share a single Python object amongst multiple DataLoader workers. Is this even possible?

From what I know, this how DataLoader actually works - an integer valued sampler is created to generate indexes that are sent across the worker processes along with the dataset instance that’s passed to the DataLoader. [This is the case when we do not explicitly specify a sampler via the sampler argument.]

With torchdata (datapipes), an additional step is to chain the sharding_filter datapipe to avoid data duplication.

I am not sure if I answered/understood your question correctly; could you please elaborate more in case what I answered isn’t what you were looking for?

I’m mostly wondering if there is way to share state between the worker processes. E.g, a common database handle or something like that. In hindsight, the answer is “probably not” b/c each worker process is … its own process.