When to call torch,cuda.synchronize()?

Dear All,

I have a strange problem with a code snippet which looks like the one below

# model doesn't contain any trainable parameters
def comp(model, x):
    y = model(x)
    g = x.grad
    return x.detach_(), y.detach_(), g

def run(model, x):
    x, y, g = comp(model, x)

    while True:
        some plain compute with x, y, and g, no backward() calls
        call some function that computes svd
        x.copy_(some z)
        x, y, g = comp(model, x)

The code runs correctly on the cpu; however, in order to get the correct result on the gpu I have to insert the torch.cuda.synchronize() call. It also works correctly on the gpu if the sync is replaced by a python sleep call. Could you suggest what problem it may indicate?


If you only use pytorch’s API, you should never need to call synchronize yourself.
Can you share a small code sample (that we can run on colab) that shows the bad result?