Hello,
I have a model to load on GPU for multiple inferences. I can’t forward a whole batch for inference because each inference depends of the previous inferences in differents trajectories of inferences (the model is actually a reinforcement learning’s policy). So I would like to load the model on GPU, then setup of pool of processes that can use this model to make some inferences.
I’ve tried something like this :
import torch.multiprocessing as mp
multiprocessing.set_start_method('spawn')
model = torch.load(policy_path, map_location="cuda:0")
model.share_memory()
workers_params = [(model, params1),(model,params2), ... (model, paramsN)]
def trajectory(model, params):
for _ range(10):
# do an inference with the model, depending on params
# and the previous inference
with mp.Pool(None) as pool:
results = pool.starmap(trajectory, workers_params)
The problem is, the model is duplicated on the GPU for each CPU process, raising an out of memory error.