Most effective multiple model inference

Hi all ! :slight_smile:

I’m running across an issue for a test scenario I need to run. I have a dozen of models built with the exact same pattern (WideResNet 28-10). Each of those model have slight differences due to training divergences.
I however would need to qualify each of those model on the entire test set (4 minibatches, CIFAR 10). Those models may eventually get back into a training iteration following the test.

The easy way to go would be to do a repeated iteration in the test set on each of those models, i.e. roughly :

for i, (input, target) in enumerate(test_loader):
      target, input = target.to(device), input.to(device)
      for model in models:
           output = model(input)
           loss = criterion(output, target)
           acc1 = accuracy(output, target) 
           # metric management

Which seems a bit inefficient/slow (2 min per minibatch, 4 minibatches, a thousand models … :sob: ). Is there any way to do so in a clean way which would exploit the embarassingly parallel nature of this task ?

At first I thought to concat the models (in the end its just sets of layers that could operate in parallel ?) together but I have no clue on how to achieve that in Pytorch/don’t even know if its doable. Multiprocessing also appeared as a possible choice but would it be any efficient ? Any suggestion is welcome :slight_smile:

1 Like
class MyModel(nn.Module):
    def __init__(self, model1, model2, model3):
        super().__init__()
        # You can also pass a list, rather than 
        # seperate models
        self.model1 = model1
        self.model2 = model2
        self.model3 = model3
    def forward(self, x):
        out1 = self.model1(x)
        out2 = self.model2(x)
        out3 = self.model3(x)
        return out1, out2, out3

This method involves moving all the models to GPU. So GPU memory would become bottleneck, but you will only have one set of inputs.

You can also do inference on CPU, which may be faster if your GPU RAM is small.

Thanks for your suggestion. Do you know if Pytorch would run some optimization in the background thus leading to parallelization in your suggestion ?
By reading it/not being perfect on Pytorch background optims I fear that it would eventually run the models in a sequence (like a for loop would) rather than in parallel.

The goal is to have time_parallel(model 1, …, model k) << time_unary(model 1) + … + time_unary(model k).

For GPU, forget about multiprocessing. It is a very tedious task.
For CPU you can use torch.multiprocessing.

Hello Kushaj,

the problem is that this solution runs the 3 models sequentially, not in parallel