Do enumerate()/indexing cause a host-GPU sync?

yiftach · July 7, 2023, 7:20pm

Consider the following code:

import torch
x = torch.arange(100, device='cuda').reshape(50, 2)
y = torch.empty(100, device='cuda').reshape(50, 2)
for i, batch in enumerate(x):
    y[i] = batch ** 2

My basic knowledge is that CUDA operations run asynchronously on the GPU, and that operations like .item() or print() require a sync for copying them back to the host memory.
Is anything in the above code require such a sync? Specifically I’m worried that either:
(1) using enumerate, which iterates the GPU memory but returns ints on the host memory
(2) using these ints to index an array which resides in GPU memory
require a GPU to host sync (which I would hope to avoid). Is that the case here?

Additionally, how would I verify this on my own?

Thanks,
Yiftach

ptrblck · July 7, 2023, 8:11pm

Your code shouldn’t synchronize the code and you can double check it by adding torch.cuda.set_sync_debug_mode("warn") to your code.