Continuing the discussion from Synchronize CUDA calls in Libtorch:
// demo example
timer.start();
auto input_gpu = input_cpu.to(at::kCUDA);
std::vector<torch::jit::IValue> jit_input;
jit_input.push_back(input_gpu);
auto output = model->forward(jit_input).toTensor().to(at::kCPU);
timer.stop();
Do I have to do synchronization here
or the conversion to CPU will do the synchronization.
Thank you.