Deepcopying models on CPU causing CUDA error: out of memory

I’m trying to deepcopy a model that was first trained on a GPU, and then moved to CPU, with the following code.

template_model, template_model_dict = template_model_pair

self.train(template_model, train_loader, valid_loader, num_epochs=self.base_epochs, stage='base', cuda=True)

template_model.cpu()

if next(template_model.parameters()).is_cuda:
    raise TypeError("Model is on GPU!")

for key, component in template_model_dict.items():
    if next(template_model.parameters()).is_cuda:
        raise ValueError("{} is on GPU!".format(key))

population = [deepcopy(template_model_pair) for _ in range(init_size)]

template_model is the model and template_model_dict is a dictionary that maps it keys to some of the layers of the model so generally they point to the same thing.

As in the code I’ve verified that the model is indeed on CPU. But when I do the deepcopy, I get CUDA error: out of memory.

I’ve also done some experiments and found that if the model is not trained, the code would have behaved as expected. But if the model is trained, then it would seems some part of the model is still on the GPU, even though I’ve explicitly move the model to CPU, and deepcopying the model would still take up space on the GPU.

I would like to know why this is the case (like, which part of the model are still on the GPU), and is there any way I can solve this? Thanks!

What is template_model_pair. But in general to clone a model, copy.deepcopy(model) works.

template_model_pair = (template_model, template_model_dict) is a tuple of template_model and template_model_dict. As I said, both have been moved to CPU but copying them still take up space on GPU.

Turned out that I forgot I’ve modified my model class to save the feature maps of each layer. The .cpu() method doesn’t know those tensors have to be moved to the CPU too. LOL