The internal states might have been stored as CUDATensors, if you’ve pushed the model to the GPU in your previous run.
Does your code work, if you push the model to the GPU again before initializing the optimizer?
4 Likes