Aten::copy_ not safety when copy tensor from cpu to device

I have recently been reading the implementation of the PyTorch copy_ operator. The link is: PyTorch GitHub - Copy.cu. My understanding is as follows:

  1. When copying a CPU tensor to a device, it seems that the CPU tensor may be released prematurely, which could potentially cause the copy_ operator to execute incorrectly.
  2. When the CPU tensor is in pinned memory, the code at PyTorch GitHub - Copy.cu#L256C5-L256C37 will take effect and ensure that the CPU tensor is released only after it has been used, thus ensuring the correctness of the copy_ operator.

My question is: Is there really a bug with copying a CPU tensor to a device?

Here is my test code.

import torch

def copy_tensor(device_tensor):
    cpu_tensor = torch.empty(10000, 10000, dtype=torch.float32, pin_memory=False)
    device_tensor.copy_(cpu_tensor, non_blocking=True)


def main():
    device_tensor = torch.empty(10000, 10000, dtype=torch.float32, device='cuda')
    copy_tensor(device_tensor)


if __name__ == "__main__":
    main()

Do you see any issues or is this a theoretical concern?

It is a theoretical concern. I tried to write some code to reproduce the issue, but I was not successful.

The code is as follows:

import torch

def copy_tensor(device_tensor, val):
    cpu_tensor = torch.empty(10000, 10000, dtype=torch.float32, pin_memory=False).fill_(val)
    device_tensor.copy_(cpu_tensor, non_blocking=True)


def main():
    tensor_list = []
    for i in range(50):
        device_tensor = torch.empty(10000, 10000, dtype=torch.float32, device='cuda')
        copy_tensor(device_tensor, float(i))
        tensor_list.append(device_tensor)
    
    for i in range(50):
        assert torch.all(tensor_list[i] == float(i)), f"i = {i}"


if __name__ == "__main__":
    main()

Could the expert who is familiar with the PyTorch copy_ operator help me take a look at this issue?

There is some discussion at [Bug] Data on CPUs Are Not Synchronized Before Subsequent Operations · Issue #127612 · pytorch/pytorch · GitHub which solved my doubts.