I narrow down the comparison to the below code snippets (v16mms
is a List/Vector of models):
spdlog::info("Running inference");
for (size_t i = 0; i < v16mms.size(); ++i) {
outputs[i] = v16mms[i].forward(input).toTensor();
output += outputs[i];
}
spdlog::info("Done");
VS
logging.info('Running inference')
for i in range(len(v16mms)):
outputs.append(v16mms[i](images_tensor))
output += outputs[i]
logging.info('Done')
Two snippets do the same thing, they run the same set of models and then add the inference results to output
.
My tests show that PyTorch version takes ~230ms while the LibTorch version takes ~300ms. Any idea why LibTorch is even slower?
(In case you want a minimally reproducible example, you can find the LibTorch file here and the PyTorch file here)