Hi,
I have a question regarding the PyTorch DataLoaders. I know that when we set num_workers > 0
, in the dataloader
, it creates multiple process and not threads, so no shared memory among them. Each process is passed an object of the dataset
class, collate_fn
, and worker_init_fn
(According to the docs here).
Now, if my dataset object has an attribute which contains all the data and say it takes n
amount of memory on my RAM. Will my program take n * num_workers
amount of memory or will the dataloader
somehow shard the data such that each workers has access to only subset of data and sum of memory across all workers on RAM stays just n
?