Just showing a few lines in my validation step:
td = TweedieDevianceScore(power=self.power)
x = self.base_model(x, mask)
preds = torch.exp(self.linear(x).mean(axis=1).squeeze(1) + torch.log(exposure))
print(f'preds device: {preds.device} y device: {y.device}')
loss = td(preds=preds, targets=y)
The result of the print statement seems to indicate those tensors are all on the gpus:
preds device: cuda:0 y device: cuda:0
preds device: cuda:1 y device: cuda:1
preds device: cuda:2 y device: cuda:2
preds device: cuda:3 y device: cuda:3
And yet I am seeing the following error:
Exception has occurred: RuntimeError (note: full exception trace is shown but execution is paused at: validation_step)
Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
File "/home/ubuntu/deep-behavior-embedding/src/model/lightning_model.py", line 375, in validation_step (Current frame)
loss = td(preds=preds, targets=y)
If preds
and y
are on the gpu, what else could be going wrong here?