as thisissue say that: every time you move model to different device, you should build optimizer again, so
model = Model()
model.cuda()
optimizer = optim.Adam(model.parameters())
for d, gt in trn_dataloader:
# train
...
optimizer.step()
model.cpu() # move to cpu
# eval or do other things
...
model.cuda() # but finnally, move back
does optimizer run as expected?