I’m trying to deepcopy a model that was first trained on a GPU, and then moved to CPU, with the following code.
template_model, template_model_dict = template_model_pair
self.train(template_model, train_loader, valid_loader, num_epochs=self.base_epochs, stage='base', cuda=True)
template_model.cpu()
if next(template_model.parameters()).is_cuda:
raise TypeError("Model is on GPU!")
for key, component in template_model_dict.items():
if next(template_model.parameters()).is_cuda:
raise ValueError("{} is on GPU!".format(key))
population = [deepcopy(template_model_pair) for _ in range(init_size)]
template_model
is the model and template_model_dict
is a dictionary that maps it keys to some of the layers of the model so generally they point to the same thing.
As in the code I’ve verified that the model is indeed on CPU. But when I do the deepcopy, I get CUDA error: out of memory
.
I’ve also done some experiments and found that if the model is not trained, the code would have behaved as expected. But if the model is trained, then it would seems some part of the model is still on the GPU, even though I’ve explicitly move the model to CPU, and deepcopying the model would still take up space on the GPU.
I would like to know why this is the case (like, which part of the model are still on the GPU), and is there any way I can solve this? Thanks!