Correct way to save model while training

adishegde · July 28, 2019, 9:27am

I wanted to save my model while training every few epochs and was wondering about the best way to go about it. The approach suggested in this link seems to be a common/popular way to do so.

However, I don’t fully understand how the above method works. By calling model.cpu and then model.cuda won’t we be creating new objects for the parameters different from the ones before calling either of the two functions as suggested in the docs? If this is true, then won’t we need to change the parameters the optimizer updates? I’ve not seen this being mentioned anywhere though.

I’m just getting started with PyTorch and so I apologize for any ignorance on my part.

ptrblck · July 28, 2019, 12:06pm

The push to the CPU and back to GPU shouldn’t be a problem, as the id of the parameters shouldn’t change, thus the optimizer still holds valid references to the parameters.
However, I would suggest to use the same device after storing the state_dict, since internal states of the optimizer (e.g. using Adam) will also be stored on the initial device.

adishegde · July 28, 2019, 2:20pm

Thanks! Yeah, I’ll make sure to use the same device.

Just a quick follow up though, what do the id of the parameters depend on?

model_1.cpu()
model_2.cpu()

model_2.gpu()
model_1.gpu()

Would the id of the parameters change in this case? Just trying to avoid possible mistakes.