How to reduce the times of object loading in pytorch distributed data parrallel?

I am working on a PyTorch project built on mmdetection. In this project, the ground truths are generated from a huge file that should be loaded into memory before the training process. Illustrate in the following code:

In tools/

from annotation_handler import preload_annotaiton

# ...

assert os.path.exists(



ANNOS = dict()

def preload_annotation(path):
    ANNOS.update({path: load(path)})

def get_annotaion(query, anno_path):
    global ANNOS
    if anno_path not in ANNOS:
        ANNOS.update({path: load(path)})  # NOTE: the laod() is costly in time
    gts = generate_proper_groundtruth(query, ANNOS[anno_path])
    return gts

In dataset

from annotation_handler import get_annotation

class MyDataset(Dataset):
    def __getitem__(self, idx):
        source = load_source(idx)
        query = querys[idx]
        target = get_annotation(query, self.anno_path)
        return source, target

My implementation expects to load the large file only once. However, I found the large annotation reloaded at each epoch’s beginning when using distributed training. And the reload time is the same as the number of workers * the number of GPUs. The beginning of each epoch, the data loading time increase dramatically.

  1. How to ensure the large annotation file is only loaded at the beginning.
  2. How to share the large annotation across multiple subprocesses in torch.multiprocessing?