I want to keep around the best model while training without writing it to disk every time a new best model is encountered, which is what I’m currently doing.
Can I copy the model that is on the GPU over to the CPU somehow? I don’t want to make an extra copy on the GPU since I need that memory.
You can keep a copy of its state_dict(). This should work:
from collections import OrderedDict
best_model_state_dict = {k:v.to('cpu') for k, v in model.state_dict().items()}
best_model_state_dict = OrderedDict(best_model_state_dict)
Later, you will be able to load that model by doing model.load_state_dict(best_model_state_dict)