The error is most likely not with the .to() op. It’s just that cuda is asynchronous and so errors will point to the wrong line. Run with CUDA_LAUNCH_BLOCKING=1 to make sure the error points to the right line.
As far as I can see. This is nothing to do with to() method. In general, pytorch expects both the model and the input data to be of the same data type i.e float in our case. And if you are using GPU both the model weight and the input should be moved to gpu. Else the error mentioned will occur. As the first step check for the architecture class.
you likely did something wrong in your module code, e.g., not properly register parts as submodules.
@albanD CUDA is async, but out-of-kernel checks are not, since they are done without looking at the data contained. So CUDA_LAUNCH_BLOCKING won’t change things.