I hope to perform the ensemble inference on a same validation data on multiple GPUs (i.e. 4 GPUS).
Originally, there was some data parallellism in this framework, and if I just used one single model for inference, it worked well with utility for all 4 GPUs above 85%.
But if I tried to use 2 models to do the inference, it got much slower and both GPU and CPU utility dropped to 25%. I think it must be caused by my not using the correct method to parallel it (I am using a for loop here):
##### The is for the evaluation ###### pretrained_models = ['model1', 'model2'] pool =  for i, cur_model in enumerate(pretrained_models): prediction = prediction_dict[cur_model] pool.append(prediction.unsqueeze(0)) if(i==len(pretrained_models)-1): tmp = torch.cat(pool) ensemble_pred = tmp.mode(dim=0).value my_metric_save(ensemble_pred);
The basic idea is, assuming we already have the prediction vectors obtained from both pretrained models, I am using a for-loop to extract them one after another, and finally combined them together as a new prediction vector “ensemble_pred”. I don’t know how to profile the runtime, but this probably destroyed the original parallel flow, so it slowed down the validation dramatically.
Could someone provide some guidance on what is the efficient way to do ensemble inference (mulitple pretrained models to evaluate one same data)?