Consider the following code:
import torch
x = torch.arange(100, device='cuda').reshape(50, 2)
y = torch.empty(100, device='cuda').reshape(50, 2)
for i, batch in enumerate(x):
y[i] = batch ** 2
My basic knowledge is that CUDA operations run asynchronously on the GPU, and that operations like .item()
or print()
require a sync for copying them back to the host memory.
Is anything in the above code require such a sync? Specifically I’m worried that either:
(1) using enumerate
, which iterates the GPU memory but returns ints on the host memory
(2) using these ints to index an array which resides in GPU memory
require a GPU to host sync (which I would hope to avoid). Is that the case here?
Additionally, how would I verify this on my own?
Thanks,
Yiftach