libtorch cpu inference servering multi request thread

I’ve been using libtorch to run my pytorch model in a single thread.
The process is:

  1. module = jit::load(mode)
  2. out_tensor = module.forward()

In the online environment, we use multiple threads to handle requests. Like in tf, the process is:

  1. model = load_graph() in parent thread
  2. session = create_session(model), create_session in each child threads. These child threads share the weight of the model
  3. run_session in each child theads.

How do I implement this structure in libtorch?
Please help me! Thanks.

Hi, I met the same problem. Have you solved it?

you ask yourself…ahahahahah.