Are CUDA tensors always updated synchronously? Is there some inherent mechanism for thread safety?
I am updating a tensor from multiple processes, but I don’t see errors due to lack of locking.
import torch, torch.multiprocessing as mp n_processes, n_iterations = 4, 1_000_000 def fn(t): for _ in range(n_iterations): t += 1 if __name__ == "__main__": context = mp.get_context('spawn') processes =  shared_array = torch.zeros(1, device = 'cpu') shared_array.share_memory_() for _ in range(n_processes): p = context.Process(target=fn, args=(shared_array,)) processes.append(p) p.start() for process in processes: process.join() print(shared_array)
tensor([3857068.]) which indicates lack of thread safety. (A thread safe output would be 4_000_000)
shared_array = torch.zeros(1, device = 'cuda:0') results in the output
tensor([4000000.], device='cuda:0') which indicates thread safety.
I observe this issue persistently over multiple runs on a Tesla V100, CUDA 11.6, PyTorch 1.10.2