Is the optimizer on CUDA?

FlorentMeyer · September 3, 2020, 1:06pm

Hey!

I just have a quick question.
I got a bit confused after having read this thread:

It’s quite old now and I saw things change quickly here!
My question is (I didn’t really find much about it on the forum or the doc): does the optimizer have to be set onto CUDA?
I know you must set the model onto CUDA before calling the optimizer (therefore its state_dict is put onto CUDA I guess), and set the inputs onto CUDA too. But what about the optimizer? It does have a state_dict but I found nothing except the linked example about it. Does it mean optimization computations are always performed on the GPU (or maybe CPU)? Are the optimizer.state.values the same as its state_dict?

I’m quite curious so if someone knows what determines the device used for computations… I know that model.cuda() sets the optimizable parameters onto CUDA and it just works, but what if I tried to put half the layers’ parameters on CUDA and half on CPU? Where would the computations then happen?

Bonus question: according to this other thread it looks like constructing the optimizer before model.cuda() works just fine except for optimizers that use a “buffer” (dunno what this buffer is) which makes me think the optimizer “lives his own life” no matter the place the model is stored into…

Thanks in advance,
Regards,
Florent

FlorentMeyer · October 4, 2020, 7:48pm

Please allow me to re-up this question