Hi, I want to know whether libtorch must faster than python? For example, if i run VGG16, Resnet50 on the CPU, how much faster libtorch can be than python?
when i run this experiment, i got a result that their time is very closed!. But when i test addmm function, libtorch can be 3~5x faster than python. Is this normal? why a pretrained model’s performance is very different from function’s？
Well, so a cartoon is that Python introduces a roughly constant overhead per instruction. When you have a chain of relatively cheap instructions - say, like the lltm example in the C++ extension tutorial - the raw benefit of moving to C++ is about 10%. When you have more expensive operations - and I’d expect the typical convnets on not too small inputs to be so - the gain you might have is much smaller.
There are many factors and pitfalls to properly benchmarking and comparing things, so it’s hard to tell without a detailed description of what you did.
Don’t feel bad about it, though! Many of us will have forgotten a
torch.cuda.sync() or accidentally used a debug build or I personally even just misread measurements late at night to drop what would have been a promising optimization…