I am trying to finetune a sigle-gpu trained model in multi-gpus. First, I specify
CUDA_VISIBLE_DEVICES=0,1,2. Then I warp the defined model with torch.nn.DataParallel() and use the rmsprop optimizer as follows:
This code works well if I train a model in multi-gpus from scratch. However, if I start from a checkpoint of a single-gpu trained model, when it runs to code
optimizer.step()
an error shows “…/python2.7/site-packages/torch/optim/rmsprop.py”, line 52, in step state = self.state[p]
KeyError: Parameter containing:
( 0 , 0 ,.,.) =1.00000e-02 * 2.5088
( 0 , 1 ,.,.) = 1.00000e-02 * 1.6257
.
.
.
(127,126,.,.) = 1.00000e-02 *2.5302
(127,127,.,.) = 1.00000e-02 *-4.7111
[torch.cuda.FloatTensor of size 128x128x1x1 (GPU 0)]"
Does anyone know what’s the problem here? Thanks in advance!
I tried what said. It works. However, Here comes another related problem. I also need to use the optimizer state saved in the previous single-gpu training rather than create a new optimizer.
When I do this, the same error occurs. I guess something inside the saved optimizer is not consistent with the multi-gpus setting. Any suggestion to solve this? Thanks.