libtorch cpu inference servering multi request thread

leowgyang · December 26, 2019, 6:22am

I’ve been using libtorch to run my pytorch model in a single thread.
The process is:

In the online environment, we use multiple threads to handle requests. Like in tf, the process is:

model = load_graph() in parent thread
session = create_session(model), create_session in each child threads. These child threads share the weight of the model
run_session in each child theads.

How do I implement this structure in libtorch?
Please help me! Thanks.

novioleo · January 9, 2020, 3:49pm

you ask yourself…ahahahahah.