Hey!
I just have a quick question.
I got a bit confused after having read this thread:
It’s quite old now and I saw things change quickly here!
My question is (I didn’t really find much about it on the forum or the doc): does the optimizer have to be set onto CUDA?
I know you must set the model onto CUDA before calling the optimizer (therefore its
state_dict
is put onto CUDA I guess), and set the inputs onto CUDA too. But what about the optimizer? It does have a state_dict
but I found nothing except the linked example about it. Does it mean optimization computations are always performed on the GPU (or maybe CPU)? Are the optimizer.state.values
the same as its state_dict
?
I’m quite curious so if someone knows what determines the device used for computations… I know that model.cuda() sets the optimizable parameters onto CUDA and it just works, but what if I tried to put half the layers’ parameters on CUDA and half on CPU? Where would the computations then happen?
Bonus question: according to this other thread it looks like constructing the optimizer before model.cuda()
works just fine except for optimizers that use a “buffer” (dunno what this buffer is) which makes me think the optimizer “lives his own life” no matter the place the model is stored into…
Thanks in advance,
Regards,
Florent