Sharing same model (on GPU) among multiples cpu processes


I have a model to load on GPU for multiple inferences. I can’t forward a whole batch for inference because each inference depends of the previous inferences in differents trajectories of inferences (the model is actually a reinforcement learning’s policy). So I would like to load the model on GPU, then setup of pool of processes that can use this model to make some inferences.

I’ve tried something like this :

import torch.multiprocessing as mp

model = torch.load(policy_path, map_location="cuda:0")

workers_params = [(model, params1),(model,params2), ... (model, paramsN)]

def trajectory(model, params):
        for _ range(10):
             # do an inference with the model, depending on params 
             # and the previous inference

with mp.Pool(None) as pool:
       results = pool.starmap(trajectory, workers_params)

The problem is, the model is duplicated on the GPU for each CPU process, raising an out of memory error.