Libtorch spends lots of time once changing the input size

cfanyyx · April 23, 2019, 12:40pm

Env: C++, cuda9, self compiled libtorch from souce(1.1.0a0+deadf3b), tracing model
Problem: libtorch spends lots of time in reloading the model if I change the input size during inference(In fact, it seems the same in python). If I don’t change the input size, it will get fast inference after the first time-consuming input. What can I do to accelerate the inference if I need to change the input size all the time?

mhubii · April 23, 2019, 3:02pm

what do you mean by changing the input size?

cfanyyx · April 24, 2019, 2:15am

When I do inference, the size of each test input is different. For example, first one is [1, 3, 512, 960], then second one may be [1, 3, 1440, 1000]…

cfanyyx · April 25, 2019, 6:16am

I tried to use c10::cuda::CUDACachingAllocator::emptyCache(); to see whether it may make some differences. But it did not help.

cfanyyx · April 25, 2019, 10:17am

I also found it would occupy more gpu memory using c++ than using python. Hoping to know the reason.

mhubii · April 25, 2019, 3:04pm

hm that seems unlikely as the c++ api is really just that. The code it interfaces with is just the same that the python api uses. Does the python equivalent of your code also slow down once you change the input sizes?

cfanyyx · April 25, 2019, 3:59pm

Yes, it is! And I found c++ would use more gpu memory and inference time than python.

ngimel · April 25, 2019, 10:29pm

do you have cudnn benchmarking mode turned on? (torch.backends.cudnn.benchmark=True in python, not sure about C++ apis). If so, pytorch would search for the fastest convolution algorithm for each input size and that can slow you down.

cfanyyx · April 26, 2019, 8:29am

Thanks for your reply. I have tried torch.backends.cudnn.benchmark=True/False on both c++ and python. I find it makes no difference whether the benchmark value is set to True or False.

I also find that if an input image with a typical size has been inferred once, then the model can make fast inference when it encounters the input image with the same size.

–debug-- input_concat shape: torch.Size([1, 3, 2272, 1280])
cost: 4.270411252975464
–debug-- input_concat shape: torch.Size([1, 3, 2272, 1280])
cost: 0.027519702911376953
–debug-- input_concat shape: torch.Size([1, 3, 2272, 1280])
cost: 0.058194875717163086
–debug-- input_concat shape: torch.Size([1, 3, 2272, 1280])
cost: 0.04706835746765137
–debug-- input_concat shape: torch.Size([1, 3, 2272, 1280])
cost: 0.048264265060424805
–debug-- input_concat shape: torch.Size([1, 3, 2272, 1280])
cost: 0.04921388626098633
–debug-- input_concat shape: torch.Size([1, 3, 1696, 1280])
cost: 3.20564603805542
–debug-- input_concat shape: torch.Size([1, 3, 1712, 1280])
cost: 3.278776168823242
–debug-- input_concat shape: torch.Size([1, 3, 1696, 1280])
cost: 0.03748798370361328
–debug-- input_concat shape: torch.Size([1, 3, 1280, 1696])
cost: 3.295097589492798
–debug-- input_concat shape: torch.Size([1, 3, 1712, 1280])
cost: 0.032279014587402344
–debug-- input_concat shape: torch.Size([1, 3, 1712, 1280])
cost: 0.05348610877990723
–debug-- input_concat shape: torch.Size([1, 3, 1712, 1280])
cost: 0.04692506790161133
–debug-- input_concat shape: torch.Size([1, 3, 1712, 1280])
cost: 0.03901052474975586
–debug-- input_concat shape: torch.Size([1, 3, 2272, 1280])
cost: 0.05834245681762695
–debug-- input_concat shape: torch.Size([1, 3, 1696, 1280])
cost: 0.034003496170043945
–debug-- input_concat shape: torch.Size([1, 3, 1696, 1280])
cost: 0.04744458198547363
–debug-- input_concat shape: torch.Size([1, 3, 1712, 1280])
cost: 0.04431772232055664
–debug-- input_concat shape: torch.Size([1, 3, 2272, 1280])
cost: 0.06284832954406738
–debug-- input_concat shape: torch.Size([1, 3, 1712, 1280])
cost: 0.04384565353393555
–debug-- input_concat shape: torch.Size([1, 3, 1280, 1712])
cost: 2.316293716430664
–debug-- input_concat shape: torch.Size([1, 3, 1712, 1280])
cost: 0.027036666870117188
–debug-- input_concat shape: torch.Size([1, 3, 1712, 1280])
cost: 0.050749778747558594
–debug-- input_concat shape: torch.Size([1, 3, 1280, 1696])
cost: 0.047841787338256836
–debug-- input_concat shape: torch.Size([1, 3, 1280, 1952])
cost: 3.9861865043640137
–debug-- input_concat shape: torch.Size([1, 3, 1712, 1280])
cost: 0.027935028076171875
–debug-- input_concat shape: torch.Size([1, 3, 1280, 1280])
cost: 2.379218101501465
–debug-- input_concat shape: torch.Size([1, 3, 1696, 1280])
cost: 0.02846550941467285
–debug-- input_concat shape: torch.Size([1, 3, 1712, 1280])
cost: 0.04407143592834473
–debug-- input_concat shape: torch.Size([1, 3, 2272, 1280])
cost: 0.06075263023376465
–debug-- input_concat shape: torch.Size([1, 3, 1920, 1280])
cost: 3.509601593017578
–debug-- input_concat shape: torch.Size([1, 3, 1712, 1280])
cost: 0.021055936813354492
–debug-- input_concat shape: torch.Size([1, 3, 1696, 1280])
cost: 0.03338050842285156
–debug-- input_concat shape: torch.Size([1, 3, 1712, 1280])
cost: 0.04442191123962402
–debug-- input_concat shape: torch.Size([1, 3, 1712, 1280])
cost: 0.04460453987121582
–debug-- input_concat shape: torch.Size([1, 3, 1712, 1280])
cost: 0.03949451446533203
–debug-- input_concat shape: torch.Size([1, 3, 1712, 1280])
cost: 0.052010297775268555
–debug-- input_concat shape: torch.Size([1, 3, 1280, 1696])
cost: 0.050638675689697266
–debug-- input_concat shape: torch.Size([1, 3, 1712, 1280])
cost: 0.058187246322631836
–debug-- input_concat shape: torch.Size([1, 3, 1712, 1280])
cost: 0.054537296295166016
avg cost: 0.6914166688919068

Riddick_Gao · May 15, 2019, 12:06pm

I have the same problem that libtorch uses much more GPU memory than python with the same image size. Do you have deal with it? Thank you.

cfanyyx · May 16, 2019, 3:00am

Actually, I have not figured it out yet.

Aeroxander · May 28, 2019, 7:51pm

Putting torch::NoGradGuard no_grad; in the scope is the first solution, haven’t found much more that works yet tho… Would love to be able to easily quantisize the model when tracing the model to C++