When does D2H happen?

Although to(at::kCUDA) was used for the model and input value, the output value printing was possible without DeviceToHost memcpy.
How is this possible?
Is that what’s happening internally? If so, where does it happen?

It should happen in Formatting.cpp via the to(kCPU, kDouble) operation.