I have a model to load on GPU for multiple inferences. I can’t forward a whole batch for inference because each inference depends of the previous inferences in differents trajectories of inferences (the model is actually a reinforcement learning’s policy). So I would like to load the model on GPU, then setup of pool of processes that can use this model to make some inferences.
I’ve tried something like this :
import torch.multiprocessing as mp multiprocessing.set_start_method('spawn') model = torch.load(policy_path, map_location="cuda:0") model.share_memory() workers_params = [(model, params1),(model,params2), ... (model, paramsN)] def trajectory(model, params): for _ range(10): # do an inference with the model, depending on params # and the previous inference with mp.Pool(None) as pool: results = pool.starmap(trajectory, workers_params)
The problem is, the model is duplicated on the GPU for each CPU process, raising an out of memory error.