Bring tensors to GPU in a batch

Iterating over tensors and calling .to("cuda") causes a lot of overhead while the GPU is not really busy. Is there a way to tell pytorch to bring a batch of tensors to the GPU?

I think you could use tensordicts, but I’m unsure how allocation on GPU happens under the hood.
https://pytorch.org/tensordict/main/overview.html

Yes tensordict will execute that somewhat faster using non_blocking data transfer. Happy to look at the profile if you’re willing to share it with that lib!

1 Like