Effect of calling model.cuda() after constructing an optimizer

It is fine in case of SGD. However, if the optimizer constructs some buffer in __init__ basing on the parameter type, then you will have some problem, e.g. https://github.com/pytorch/pytorch/blob/master/torch/optim/adagrad.py#L30

1 Like