Deep copy of model weights

sofiane · February 23, 2022, 4:32am

I’d like to make a deep copy of weights of a model, I find out that the first deep copy may be copying a reference dictionary, but that’s not the purpose of deepcopy. train(30) trains a model for 30 episodes, but this affects the state_dict that I deepcopied before.

model_sd = deepcopy(network.state_dict())
train(30)
another_model_sd = deepcopy(network.state_dict())
model_sd['l.0.weight'] == another_model_sd['l.0.weight']

I tried the same thing with deepcopy of network

copy1 = deepcopy(network)
train(30)
copy2 = deepcopy(network)
copy1.l[0].weight == copy2.l[0].weight

output for both methods :

tensor([[True, True, True, True],
[True, True, True, True],
[True, True, True, True],
[True, True, True, True],
…

ptrblck · February 23, 2022, 7:49am

I cannot reproduce the issue using:

model = nn.Linear(1, 1)
optimizer = torch.optim.SGD(model.parameters(), lr=1)
sd_ref = copy.deepcopy(model.state_dict())

for _ in range(10):
    optimizer.zero_grad()
    out = model(torch.randn(1, 1))
    out.mean().backward()
    optimizer.step()
    
sd = copy.deepcopy(model.state_dict())

for key1, key2 in zip(sd_ref, sd):
    print((sd_ref[key1] - sd[key2]).abs().max())
# > tensor(1.3497)
#   tensor(10.)
    
print(sd['weight'] == sd_ref['weight'])
# > tensor([False])
print(sd['bias'] == sd_ref['bias'])
# > tensor([False])

Are you sure the “global” network was changed and not a local object in train?

sofiane · February 23, 2022, 2:31pm

Sorry that was a bug, the training is never done due to a condition I set that is never respected.