It is fine in case of SGD. However, if the optimizer constructs some buffer in __init__
basing on the parameter type, then you will have some problem, e.g. https://github.com/pytorch/pytorch/blob/master/torch/optim/adagrad.py#L30
1 Like