Are these the same in effect?

I checked some threads, and need some clarification on these:

torch.zeros(100, device="gpu")
torch.zeros(100).to("cuda")

Both of these will end on GPU as far as I know, but is there any difference?

If the tensor does not require gradients, both will yield the same result.
However, if it does, you should stick to the first approach, as the second one will create a non-leaf variable as described here.

Nice feedback.

What about memory copy operation. Is there any difference in that respect.Which one is more effective?

The first one should be more efficient, as the tensor will be directly created on the device, while the other will be created on the CPU first, then pushed onto the GPU.

1 Like

Great. It is hard for the to think, since I am biased with CPU processing a lot…

This means when we torch.zeros(100, device="gpu") tensor will be created directly in GPU, maybe the CPU will only have the instruction to create that (CUDA instruction).

And in the second case we will create the tensor on CPU first and copy that tensor to GPU (“to” api). Right?