Performance difference between torch.zeros((...), device=dev) and torch.zeros((...)).to(dev)?

bananacode · October 17, 2018, 6:40am

I’ve got two questions regarding performance.

Is there a performance difference between first creating a tensor, then sending it to the device with the “to” function, and with specifying the device directly during creation?
When having tensors on the GPU and switching data type (e.g from int to float), does it do this directly on the device or first moves it back to CPU before casting?

SimonW · October 17, 2018, 7:21am

yes, one is directly creating on the particular device. and the other one creates on cpu and does a copy (if dev is not cpu)

directly on the device

bananacode · October 17, 2018, 8:01am

Thanks for the rapid response. So to confirm, torch.zeros((…), device=dev) is the faster way?

SimonW · October 17, 2018, 11:53pm

yes you are correct indeed