Deepcopy vs. load_state_dict

rasoolfa · November 19, 2019, 9:57pm

Hi,

Is there any advantage to using “load_state_dict” not “deepcopy” or vice versa when one wants to ‘deep’ copy, i.e. updating model_B doesn’t change model_A after copy, one model to another?

class DummyNet(nn.Module):
     def __init__(self,..):
         super(dummyNet, self).__init__()
         ...
     def forward(...)
          ...

Method 1:

model_A = DummyNet()
model_B = copy.deepcopy(model_A)

Method 2:

model_A = DummyNet()
model_B = DummyNet()
model_B.load_state_dict(model_A.state_dict())

They practically lead to the same result but want to make sure.

Thanks,

ptrblck · November 20, 2019, 5:25am

As you said, both codes should work, so that a parameter update in one model does not change the other and vice versa:

modelA = models.resnet18()
modelB = models.resnet18()
modelA.load_state_dict(modelB.state_dict())

# Compare random parameter
print((modelA.fc.weight == modelB.fc.weight).all())
> tensor(True)

# Manipulate param
with torch.no_grad():
    modelA.fc.weight.zero_()

print((modelA.fc.weight == modelB.fc.weight).all())
> tensor(False)

saurav.prakash16 · December 15, 2020, 4:55pm

Does copy.deepcopy(model_A) copies the state variables as well, like in model_B.load_state_dict(model_A.state_dict())? If so, in what situations is one preferred over another for deep copy?