Hi,
Imagine that you have enough RAM to load and store all the images
and the targets
of your dataset in a list
during the __init__
of your Dataset
class. When training a model using torchrun
, each gpu will create its own Dataset
, thus leading to N*dataset_size
of RAM used.
Is it possible to share the images
and targets
lists to all processes without replications, so that the memory consumption is still dataset_size
?
FYI, my Dataset
inherits from COCO.