So I did the training of my neural network, say on GPU #3 and the state dict is saved. Do I have to use GPU #3 every time when I want to load my state dict?
Also, will it work if I don’t put my model to cuda before loading the state dict into the model?
You can use the
map_location argument in
torch.load to map your GPU tensors to the CPU as explained in the note section.
Alternatively you could of course push the model to the CPU before saving its
Say if I want to save the
state_dict for every epoch, do I have to push the model back to GPU after saving the
state_dict on CPU?
Yes, the model would have to be copied to the GPU again.
However, depending on the length of your epoch and how often you save the model (e.g. only the best model using the validation accuracy), the latency might be negligible.