I just have a quick question.
I got a bit confused after having read this thread:
It’s quite old now and I saw things change quickly here!
My question is (I didn’t really find much about it on the forum or the doc): does the optimizer have to be set onto CUDA?
I know you must set the model onto CUDA before calling the optimizer (therefore its
state_dictis put onto CUDA I guess), and set the inputs onto CUDA too. But what about the optimizer? It does have a
state_dictbut I found nothing except the linked example about it. Does it mean optimization computations are always performed on the GPU (or maybe CPU)? Are the
optimizer.state.valuesthe same as its
I’m quite curious so if someone knows what determines the device used for computations… I know that model.cuda() sets the optimizable parameters onto CUDA and it just works, but what if I tried to put half the layers’ parameters on CUDA and half on CPU? Where would the computations then happen?
Bonus question: according to this other thread it looks like constructing the optimizer before
model.cuda() works just fine except for optimizers that use a “buffer” (dunno what this buffer is) which makes me think the optimizer “lives his own life” no matter the place the model is stored into…
Thanks in advance,