Moving Tensors from gpu to cpu to gpu: time bottleneck

That sounds great!
I enjoyed this GTC 2022 - How CUDA Programming Works talk by Stephen Jones in which he explains more details about the physical limitations of the GPU, memory bandwidth, compute, occupancy etc. so it might also be interesting for you.

1 Like