sofiane
(so)
February 23, 2022, 4:32am
1
I’d like to make a deep copy of weights of a model, I find out that the first deep copy may be copying a reference dictionary, but that’s not the purpose of deepcopy. train(30) trains a model for 30 episodes, but this affects the state_dict that I deepcopied before.
model_sd = deepcopy(network.state_dict())
train(30)
another_model_sd = deepcopy(network.state_dict())
model_sd['l.0.weight'] == another_model_sd['l.0.weight']
I tried the same thing with deepcopy of network
copy1 = deepcopy(network)
train(30)
copy2 = deepcopy(network)
copy1.l[0].weight == copy2.l[0].weight
output for both methods :
tensor([[True, True, True, True],
[True, True, True, True],
[True, True, True, True],
[True, True, True, True],
…
ptrblck
February 23, 2022, 7:49am
2
I cannot reproduce the issue using:
model = nn.Linear(1, 1)
optimizer = torch.optim.SGD(model.parameters(), lr=1)
sd_ref = copy.deepcopy(model.state_dict())
for _ in range(10):
optimizer.zero_grad()
out = model(torch.randn(1, 1))
out.mean().backward()
optimizer.step()
sd = copy.deepcopy(model.state_dict())
for key1, key2 in zip(sd_ref, sd):
print((sd_ref[key1] - sd[key2]).abs().max())
# > tensor(1.3497)
# tensor(10.)
print(sd['weight'] == sd_ref['weight'])
# > tensor([False])
print(sd['bias'] == sd_ref['bias'])
# > tensor([False])
Are you sure the “global” network was changed and not a local object in train
?
sofiane
(so)
February 23, 2022, 2:31pm
3
Sorry that was a bug, the training is never done due to a condition I set that is never respected.