Is the optimizer on CUDA?


I just have a quick question.
I got a bit confused after having read this thread:

It’s quite old now and I saw things change quickly here!
My question is (I didn’t really find much about it on the forum or the doc): does the optimizer have to be set onto CUDA?
I know you must set the model onto CUDA before calling the optimizer (therefore its state_dict is put onto CUDA I guess), and set the inputs onto CUDA too. But what about the optimizer? It does have a state_dict but I found nothing except the linked example about it. Does it mean optimization computations are always performed on the GPU (or maybe CPU)? Are the optimizer.state.values the same as its state_dict?

I’m quite curious so if someone knows what determines the device used for computations… I know that model.cuda() sets the optimizable parameters onto CUDA and it just works, but what if I tried to put half the layers’ parameters on CUDA and half on CPU? Where would the computations then happen?

Bonus question: according to this other thread it looks like constructing the optimizer before model.cuda() works just fine except for optimizers that use a “buffer” (dunno what this buffer is) which makes me think the optimizer “lives his own life” no matter the place the model is stored into…

Thanks in advance,

Please allow me to re-up this question :slightly_smiling_face: