Replicating an already trained lstm model to serve multiple inputs in inference

pratikkulkarni228 · February 28, 2019, 7:47pm

Hello! Is there an efficient way to load a model (I already have the trained model) during inference, multiple times, parallely (replicate), that essenially share the same weights and characteristics but are fed different and variable length inputs?

Kushaj · March 4, 2019, 8:40pm

Have one as master server. I don’t understand the context. In case of CPU you have torch.multiprocessing using which you can create new threads sharing the same data (i.e. your model) and execute those threads on different cores.
In case of GPU you will have to load the model onto each GPU memory as far as I understand.

pratikkulkarni228 · March 8, 2019, 4:30am

Thank you for your response.
Is there a specific and efficient way to load the model separately onto each GPU.?

Kushaj · March 10, 2019, 12:51pm

Use the to(device) method.

model = MyModel()
model = model.to(torch.device('cuda:0')