Modifying Weight of Pretrained Model in `forward()` Makes Training Slowing

ptrblck · January 7, 2023, 10:16pm

I don’t know how exactly your model is used and where the computation graph might be stored as I cannot reproduce any increase in memory usage using:


device = 'cuda'
pre_model = models.resnet18().to(device)
b = []
A = []
for params in list(pre_model.parameters()):
    A.append(torch.rand_like(params))
    b_temp = nn.Parameter(torch.rand_like(params))
    b.append(b_temp.detach().clone())
B = nn.ParameterList(b)

modelwithAB = Net(pre_model, B)
optimizer = torch.optim.Adam(modelwithAB.parameters(), lr=1e-3)

image = torch.randn(2, 3, 224, 224).to(device)
print(torch.cuda.memory_allocated()/1024**2)

for _ in range(10):
    optimizer.zero_grad()
    out = modelwithAB(image, A)
    out.mean().backward()
    optimizer.step()
    print(torch.cuda.memory_allocated()/1024**2)

The print statement shows an approx. constant memory usage, which looks correct.