Device to host transfer

When running a module on a device e.g. GPU or TPU does the output stay on device or is it always sent back to the host CPU by default?

it is always on the device, and not transferred back to the CPU host memory unless you call .cpu() (or .to(device='cpu') on the output Tensor

1 Like