PyTorch faster than Libtorch for CNN inference

I narrow down the comparison to the below code snippets (v16mms is a List/Vector of models):

spdlog::info("Running inference");
for (size_t i = 0; i < v16mms.size(); ++i) {
  outputs[i] = v16mms[i].forward(input).toTensor();
  output += outputs[i];
}
spdlog::info("Done");

VS

logging.info('Running inference')
for i in range(len(v16mms)):
    outputs.append(v16mms[i](images_tensor))
    output += outputs[i]
logging.info('Done')

Two snippets do the same thing, they run the same set of models and then add the inference results to output.

My tests show that PyTorch version takes ~230ms while the LibTorch version takes ~300ms. Any idea why LibTorch is even slower?
(In case you want a minimally reproducible example, you can find the LibTorch file here and the PyTorch file here)

Firstly, if you are exporting the model via jit and then using it in c++ you should cycle the model derivation time a few more times before it stabilises, the initial few times are time consuming due to the need to build a graph
example:
Load model successful !
t 277.576ms
t 213.905ms
t 187.971ms
t 86.0545ms
t 47.8444ms
t 47.9361ms
t 50.0757ms
t 60.5665ms
t 46.4637ms
t 46.4323ms
t 46.995ms
t 47.0363ms
Secondly, assignment between tensors in c++ is slow, if you want to speed up in c++ try using pointers.

Hi!! Im interested in the Tensor assignment vs pointer. Could you give an example to show how passing pointers look like and how it is faster?

I’m sorry to be so late, but you can check out the content of this site where he goes into more detail about this. Libtorch中tensor读写方法对比_libtorch tensor 逐个赋值-CSDN博客