Effect of calling model.cuda() after constructing an optimizer

Can you try adagrad? Adam doesn’t initialize such buffers: https://github.com/pytorch/pytorch/blob/master/torch/optim/adam.py

2 Likes