How do I run inference in parallel on the same model?

Hello,

I have a model as follows, where I have multiple inputs (x1, x2, x3) which are needed to be fed to the same network model1. The basic idea is to feed them one by one and get the results separately.

My question is that is there a way to make this inference faster by inference in parallel?

Thank you in advance.

class MyModel(nn.Module):
    def __init__(self, model1):
        super().__init__()
        self.model1 = model1
    def forward(self, x1, x2, x3):
        out1 = self.model1(x1)
        out2 = self.model1(x2)
        out3 = self.model1(x3)
        return out1, out2, out3
  1. What you can do is stack x1, x2 and x3 like having a batch size of 3 then use only a forward pass once as
    # assuming x1, x2 and x3 first dimension represent batch
    b = x1.shape[0] 
    inputs = torch.vstack((x1, x2, x3), dim=0)
    out = self.model1(inputs)
    out1 = out[:b]
    out2 = out[b: 2*b]
    out3 = out[2*b:]