Pytorch/caffe2 c++ forward(predict) time cost in multi-threading

the caffe2 forward is much slower(one more times) than single thread or single instance,
any one know how to deal with it?
thanks .

The single thread TEST_benchmark result:

Starting benchmark.
Running warmup runs.
Main runs.
Main run finished. Milliseconds per iter: 171.063. Iters per second: 5.84581
Operator #0 (conv1, Conv) 43.2579 ms/iter (0.403112 GFLOP, 9.318802 GFLOPS) (29.860160 MB) (0.001080 MB)
Operator #1 (conv1, PRelu) 4.23176 ms/iter
Operator #2 (pool1, MaxPool) 12.1515 ms/iter
Operator #3 (conv2, Conv) 30.6145 ms/iter (0.532466 GFLOP, 17.392603 GFLOPS) (11.832576 MB) (0.005760 MB)
Operator #4 (conv2, PRelu) 1.41119 ms/iter
Operator #5 (conv3, Conv) 60.6044 ms/iter (1.687910 GFLOP, 27.851303 GFLOPS) (23.443200 MB) (0.018432 MB)
Operator #6 (conv3, PRelu) 3.27206 ms/iter
Operator #7 (conv4-1, Conv) 2.92857 ms/iter (0.023443 GFLOP, 8.004998 GFLOPS) (1.465200 MB) (0.000256 MB)
Operator #8 (conv4-2, Conv) 3.5091 ms/iter (0.046886 GFLOP, 13.361366 GFLOPS) (2.930400 MB) (0.000512 MB)
Operator #9 (conv4-3, Conv) 7.33484 ms/iter (0.117216 GFLOP, 15.980716 GFLOPS) (7.326000 MB) (0.001280 MB)
Operator #10 (prob1, Softmax) 1.42719 ms/iter
Time per operator type:
        148.249 ms.     86.826%. Conv
        12.1515 ms.    7.11681%. MaxPool
        8.91501 ms.     5.2213%. PRelu
        1.42719 ms.   0.835873%. Softmax
        170.743 ms in Total
FLOP per operator type:
        2.81103 GFLOP.        100%. Conv
        2.81103 GFLOP in Total
Feature Memory Read per operator type:
        98.6548 MB.        100%. Conv
        98.6548 MB in Total
Feature Memory Written per operator type:
        76.8575 MB.        100%. Conv
        76.8575 MB in Total
Parameter Memory per operator type:
        0.02732 MB.        100%. Conv
        0.02732 MB in Total

When use multi-thread(num_thread=2), the time cost will increased

reproduce step:

@beichen2012
Did you use libtorch? If so, how did you set the number of threads?
Thanks,
Afshin

Please refer to : https://github.com/beichen2012/testMultiThreadCaffe2
and: https://github.com/pytorch/pytorch/issues/15432