Anyone working with a data pipeline of CPU -> GPU? I am developing a library of methods for faster transfer to GPU. In some cases, 370x faster than used Pytorch's Pinned CPU Tensors

@Santosh-Gupta Hi, thanks for your work and proposal. Actually I am very interested on tools to speed up CPU-GPU-CPU data transfers. I have written the details of my application in this post. There you will find details on what I tried and access to the code.

Basically, im trying to do real-time control @100hz on constrained devices (Jetson Nano) using pytorch for computation of the control laws on GPU. I am sure there are several bottlenecks but one of them is the CPU-GPU and GPU-CPU data transfers.

I am starting to rewrite the code to use torch JIT, but I am not quite sure about the speed up improvements. I will appreciate your support on trying out SpeedTorch (of which I read about on your post) and hopefully get some nice performance increase.