@Santosh-Gupta Hi, thanks for your work and proposal. Actually I am very interested on tools to speed up CPU-GPU-CPU data transfers. I have written the details of my application in this post. There you will find details on what I tried and access to the code.
Basically, im trying to do real-time control @100hz on constrained devices (Jetson Nano) using pytorch for computation of the control laws on GPU. I am sure there are several bottlenecks but one of them is the CPU-GPU and GPU-CPU data transfers.
I am starting to rewrite the code to use torch JIT, but I am not quite sure about the speed up improvements. I will appreciate your support on trying out SpeedTorch (of which I read about on your post) and hopefully get some nice performance increase.