Why we use torch.cuda.synchronize()?

Mahdi_Amrollahi · December 4, 2023, 2:36pm

Why we use torch.cuda.synchronize()? When we do an operation on cuda device, does not it mean that it has done one the same line of code? Should we always wait for the ongoing operations on cuda?

import torch

# Check if GPU is available
if torch.cuda.is_available():
    # Move tensor to GPU
    device = torch.device("cuda")
    x = torch.randn(1000, 1000, device=device)

    y = x.matmul(x)

    # Wait for GPU operations to finish
    torch.cuda.synchronize()

ptrblck · December 4, 2023, 4:03pm

To explicitly synchronize the code, e.g. to profile the kernel execution time.

No, since CUDA operations are executed asynchronously w.r.t. the CPU.

No, as it will slow down your code since your CPU is constantly blocked.
PyTorch adds needed synchronizations for you, e.g. if you want to print a CUDATensor.

Mahdi_Amrollahi · December 5, 2023, 6:12am

Sorry, but I am a bit confused. Suppose that we are doing an operation on cuda to predict some items and next we are going to calculate the loss e.g:

y_pred = model(x)

loss = criterion(y_pred, y_true)

So, should we explicitly add the synchronization before calculating the loss? Because we need to make sure that y_pred is ready, does not it?

Another question: how can cuda understand that for which operation should it wait for? For example, there are lots of processes running on GPU like any game, Blender, Adobe, or whatever. Which one should it wait for?

Thanks

ptrblck · December 5, 2023, 7:13am

No, you don’t need to manually synchronize the code as the next operation will be queued to the same (default) CUDAStream. Explicit synchronizations would be needed if you use custom streams.
PyTorch creates an own CUDAContext for its execution and does not interfere with other applications. The driver schedules the execution.