Thank you @ptrblck so there is nothing wrong with the program or the model, I tried it to on google colab and run stable as well. I noticed a warning when I compiled my program on my machine
CMake Warning at CMakeLists.txt:28 (add_executable):
Cannot generate a safe runtime search path for target my_program because
there is a cycle in the constraint graph:
dir 0 is [/usr/lib/libtorch_abi11_14/lib]
dir 1 is [/usr/local/cuda-10.1/lib64/stubs]
dir 2 is [/usr/local/cuda-10.1/lib64]
dir 3 must precede it due to runtime library [libnvToolsExt.so.1]
dir 3 is [/usr/local/cuda/lib64]
dir 2 must precede it due to runtime library [libnvToolsExt.so.1]
Some of these libraries may not be found correctly.
Sorry to revive this, I recently ran into the exact same problem as @AppleTree .
In my case, this was due to a memory storage issue : I used opencv Mat as inputs, and had to permute the channels to create the tensor. However, for some reason, the permutation made it non-contiguous, and this significantly slowed down inference after a few iterations.
I guess in the case of @AppleTree , you could run something like
torch::jit::script::Module module = torch::jit::load(NetworkPath, torch::kCUDA);
torch::Tensor outputArr[100];
aa_cu = aa_cu.contiguous();
std::cout << "Start Iteration run" << std::endl;
for (int i = 0; i < 100; i++) {
t = clock();
outputArr[i] = module.forward({ aa_cu }).toTensor();
std::cout << "[Time] Iter : " << i+1 << " // Time(ms) : " << clock() - t << std::endl;
}