Are there any recommended methods to clone a model?

What happens if I do this:

hNetModel = Model()
    for trainBatch, trainLabels in hTrainLoader:
        <Train the Model by a Function>
        modelEvaluationMetric = hNetModel(Validation)
        if(modelEvaluationMetric < bestModelMetric):
            hBestModel = hNetModel

Namely I run the model trhough the optimization and if its performance are the best so far I use hBestModel = hNetModel.
At the end I save the dictionary of hBestModel.
Does it makes sense or is it just another reference to the same net always?

It’s just a reference to the same net, so it will be changed when you keep optimizing.
You’ll need to use deepcopy as suggested.

Even if the training happens in a different function?
Something like:

hNet in NetList
hNet = TrainNet(hNetModel)
modelEvaluationMetric = hNetModel(Validation)
        if(modelEvaluationMetric < bestModelMetric):
            hBestModel = hNet

I thought at least when something gets back from a function it is a different copy of it (Yea, I’m not so experienced with Python).

just to make your answer clear you mean:

new_mdl = copy.deepcopy(model)

right?

why is deep copy not working for you? in what way is it not working compared to what u expected?

Does something inspired from:

or

not work for you?

Hi, copy.deepcopy(model) works fine for me in previous PyTorch versions, but as I’m migrating to version 0.4.0, it seems to break. It seems to have something to do with torch.device. How should I do cloning properly in version 0.4.0?

The traceback is as follows:
(I run
device = torch.device(‘cuda’)
generator = Generator(args.vocab_size, g_embed_dim, g_hidden_dim, device).to(device)
previously, and when I replace device with string ‘cuda’, it works then)

Traceback (most recent call last):
File “main.py”, line 304, in
rollout = Rollout(generator, args.update_rate)
File “/home/x-czh/SeqGAN-PyTorch/rollout.py”, line 14, in init
self.own_model = copy.deepcopy(model)
File “/usr/lib/python3.5/copy.py”, line 182, in deepcopy
y = _reconstruct(x, rv, 1, memo)
File “/usr/lib/python3.5/copy.py”, line 297, in _reconstruct
state = deepcopy(state, memo)
File “/usr/lib/python3.5/copy.py”, line 155, in deepcopy
y = copier(x, memo)
File “/usr/lib/python3.5/copy.py”, line 243, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File “/usr/lib/python3.5/copy.py”, line 182, in deepcopy
y = _reconstruct(x, rv, 1, memo)
File “/usr/lib/python3.5/copy.py”, line 292, in _reconstruct
y = callable(*args)
File “/usr/lib/python3.5/copyreg.py”, line 88, in newobj
return cls.new(cls, *args)
TypeError: Device() received an invalid combination of arguments - got (), but expected one of:

  • (torch.device device)
  • (str type, int index)

Deepcopy is not working for me.

I have a function train(model) which returns the trained model, model_trained = train(model_untrained). However as result both are trained at the end, but I want the model_untrained to be unchanged. So I tried to deep-copy the model_untrained inside the function before the training loop, but It is not working – the model is not trained correctly. Any idea why is it happening?

Are you trying to train the copied or the original model?
In the first case I assume the optimizer doesn’t have the references to the appropriate parameters, thus probably no model is trained.
Could you check it?

2 Likes

Yes I am training the copied model. You are right about the optimizer, I was passing the original model parameters to it. Thanks for spotting it!

1 Like

I was trying to copy a model where the forward function is using @torch.jit.script_method so that I can load it later in C++.
But when I am using deepcopy it gives the error:
can't pickle BaseModel objects
where BaseModel is classname of my model. The same code is working correctly without using jit decorator. This could be something trivial but I am unable to find a workaround.

import pickle
copyed_model = pickle.loads(pickle.dumps(model))

:grinning:

7 Likes

Can confirm, deepcopy does not work (changes to original still reflected in copy) but pickle does work.

2 Likes

classifier = pickle.loads(pickle.dumps(self.classifier))
TypeError: can’t pickle module objects

Using Adam’s suggestion:

threw:

TypeError: can't pickle dict_keys objects

for the model I am working with.

I am using python 3.7 and the model was trained on multiple GPUs.

Has anyone run into this issue with their models? Any ideas how to fix it?

Searching online, I found similar issue with deepcopy (but not in the context of PyTorch):

Apparently in python3 you have to wrap dict.keys() in list() — otherwise the deepcopy issue appears.

The answer turned out to be pretty simple. The instance attributes of your model have to be picklable. In my particular case, storing dict_keys caused the issue. Converting those to list, resolved the issue:

model.attribute = list(model.attribute)  # where attribute was dict_keys
model_clone = copy.deepcopy(model)

If I just want to copy the state dict then would temp = model.state_dict() work or do I need deep copy for state_dict as well? I later keep training so would the temp variable change?

Hi,

Yes you need to deepcopy it if you want a deep copy.
If you just do this, the temp value will be changed when you update the model.

1 Like

I use pytorch C++ interface. I need to do deep copy for modules. I think I am going to go with this route: 1) dump one module onto the disk using torch save, 2) load the dumped file into a new module class.

What’s the corresponding methods of C++ API?