Forward processing speed is slow increasingly during iterative run [libtorch. torchscript]

Kiki_Rizki_Arpiandi · October 21, 2020, 6:52am

Thank you @ptrblck so there is nothing wrong with the program or the model, I tried it to on google colab and run stable as well. I noticed a warning when I compiled my program on my machine

CMake Warning at CMakeLists.txt:28 (add_executable):
  Cannot generate a safe runtime search path for target my_program because
  there is a cycle in the constraint graph:

    dir 0 is [/usr/lib/libtorch_abi11_14/lib]
    dir 1 is [/usr/local/cuda-10.1/lib64/stubs]
    dir 2 is [/usr/local/cuda-10.1/lib64]
      dir 3 must precede it due to runtime library [libnvToolsExt.so.1]
    dir 3 is [/usr/local/cuda/lib64]
      dir 2 must precede it due to runtime library [libnvToolsExt.so.1]

  Some of these libraries may not be found correctly.

Could it the cause of fluctuation?

ptrblck · October 21, 2020, 7:57am

Not 100% sure, but I doubt it.
Are you using the latest stable libtorch release? If not, could you update to it with CUDA10.2?

Kiki_Rizki_Arpiandi · October 21, 2020, 8:10am

Thanks peter. I used libtorch 1.6 with cuda 10.1. I will try using 10.2, and I will report the result here

BadDay · October 21, 2020, 9:29am

Hi, I have seen similar behavior on my PC. It was caused by throttling CPU and GPU. Check your CPU and GPU performance during program run.

Kiki_Rizki_Arpiandi · November 20, 2020, 7:20am

Thankyou for your response, Is there a way to somehow disable the throttling mechanism

BadDay · November 20, 2020, 7:38am

Hi, I am using ThrottleStop for CPU and Asus GPU Tweak II for GPU.

abrckt · August 10, 2021, 9:10am

Sorry to revive this, I recently ran into the exact same problem as @AppleTree .

In my case, this was due to a memory storage issue : I used opencv Mat as inputs, and had to permute the channels to create the tensor. However, for some reason, the permutation made it non-contiguous, and this significantly slowed down inference after a few iterations.

I guess in the case of @AppleTree , you could run something like

torch::jit::script::Module module = torch::jit::load(NetworkPath, torch::kCUDA);
torch::Tensor outputArr[100];
aa_cu = aa_cu.contiguous();
std::cout << "Start Iteration run" << std::endl;
for (int i = 0; i < 100; i++)	{
	t = clock();
	outputArr[i] = module.forward({ aa_cu }).toTensor();
	std::cout << "[Time] Iter : " << i+1 << " // Time(ms) : " << clock() - t << std::endl;
}