I’ve got two questions regarding performance.
-
Is there a performance difference between first creating a tensor, then sending it to the device with the “to” function, and with specifying the device directly during creation?
-
When having tensors on the GPU and switching data type (e.g from int to float), does it do this directly on the device or first moves it back to CPU before casting?