How to debug "Expected all tensors to be on the same device"

When running the optim.step() the code is throwing this error that the tensors are not on the same device. I explicitly transferred the model and data .to(device) multiple times to make sure they are all on GPU. When I do

for param in model.parameters():
    print(param.device)

It prints "cuda:0" for all the params. I checked all the tensors going in and out of the model but still can’t figure out the problem. Is there any hook or debug tool that can help me pinpoint the tensor causing this problem? Thanks in advance!

PS: if I change the device to "cpu" it runs fine without any complain.

Check all tensors outside of the model.
Sorry for not giving you an useful advise