Consider the following code:
import torch x = torch.arange(100, device='cuda').reshape(50, 2) y = torch.empty(100, device='cuda').reshape(50, 2) for i, batch in enumerate(x): y[i] = batch ** 2
My basic knowledge is that CUDA operations run asynchronously on the GPU, and that operations like
print() require a sync for copying them back to the host memory.
Is anything in the above code require such a sync? Specifically I’m worried that either:
enumerate, which iterates the GPU memory but returns ints on the host memory
(2) using these ints to index an array which resides in GPU memory
require a GPU to host sync (which I would hope to avoid). Is that the case here?
Additionally, how would I verify this on my own?