Usually, when we want to train on a certain device, we do xxx.to(device), a typical procedure is
x = x.to(device)
y = y.to(device)
model = model.to(device)
where you add data and model to the desired device.
However, I noticed that some people also add criterion (loss function) to device:
criterion = criterion.to(device)
Is this necessary? Will the whole training process still occur on GPU if I only add data and model to GPU as shown in the first three lines? What’s the benefit of moving criterion to the given device?
Thanks.