What is the effect of criterion.to(device)

Usually, when we want to train on a certain device, we do xxx.to(device), a typical procedure is

x = x.to(device)
y = y.to(device)
model = model.to(device)

where you add data and model to the desired device.

However, I noticed that some people also add criterion (loss function) to device:
criterion = criterion.to(device)

Is this necessary? Will the whole training process still occur on GPU if I only add data and model to GPU as shown in the first three lines? What’s the benefit of moving criterion to the given device?

The necessity of calling criterion.to(device) would depend on the used criterion and in particular if it’s stateful, i.e. if it contains internal tensors etc.
In the latter case, you should get a device mismatch error in case this line is missing and needed, so in case you are not getting any errors, you could skip this line of code (but it would also be a no-op otherwise).