Code that loads SGD fails to load Adam state to GPU

The internal states might have been stored as CUDATensors, if you’ve pushed the model to the GPU in your previous run.
Does your code work, if you push the model to the GPU again before initializing the optimizer?

4 Likes