Note that this comes from the fact that the Tensor from numpy was on the CPU, while the param was on the GPU.
Another way to solve this (without having to send the whole model back to the CPU) would be to send the new grad to the GPU as well with .to(param.device).
I met this error again, and I found the true reason is your param.grad and new grad aren’t in the same device.
so the best method to solve this error is