Device context management: make sure to set device before calling cuda sync!

Today I learned, `torch.cuda.synchronize` will create a device context on device 0 if current device is not set:

import torch
torch.cuda.set_device(1) # comment it, then torch.cuda.synchronize() will create a new context on device 0
data = torch.zeros(1024, 1024, dtype=torch.float32, device="cuda:1")
torch.cuda.synchronize()
import subprocess
result = subprocess.run(["nvidia-smi"], check=True, capture_output=True)
print(result.stdout.decode("utf-8"))

This is expected as described in the docs:

Parameters

device (torch.device or int, optional) – device for which to synchronize. It uses the current device, given by current_device(), if device is None (default).

Since you are seeing this behavior after commenting out the set_device call the current device will be used which is 0.