# Modifying Weight of Pretrained Model in `forward()` Makes Training Slowing

I have two parameters, A and B, that I need to put to replace all the weight of the pre-trained model. So I want to utilize the forward calculation of the model but not the weight.

I want to modify the weight of the model W = A + B, where A is a fixed tensor (not trainable), but B is a trainable parameter. So, in the end, my aim is to train B in the structure of the pre-trained model.

This is my attempt:

``````class Net(nn.Module):

def __init__(self, pre_model, B):

super(Net, self).__init__()
self.B = B
self.model = copy.deepcopy(pre_model)
for params in self.model.parameters():

def forward(self, x, A):
for i, params in enumerate(self.model.parameters()):
params.copy_(A[i].detach().clone())

x = self.model(x)
return x
``````

I checked in the process, B was already trained. But the problem is in every iteration the training process keeps getting slower:

Epoch 1:
24%|██▍ | 47/196 [00:05<00:23, 6.44it/s]
57%|█████▋ | 111/196 [00:18<00:19, 4.28it/s]
96%|█████████▋| 189/196 [00:41<00:02, 2.90it/s]
Epoch 2:
6%|▌ | 11/196 [00:04<01:14, 2.50it/s]

I think I have detached all the parameters correctly, but I am not sure why it’s happened. I hope somebody here can help me to find out what exactly happened here.
Thanks

Based on your code snippet you are detaching `A`, which is the fixed tensor, while you are adding `B` to `params` potentially including its entire computation graph. Could you double check this, please?

Basically, this is what I did to get A and B (it’s a little bit different, but I’ve tested the code below, and it’s still giving slowing iteration):

``````b = []
A = []
for params in list(pre_model.parameters()):
A.append(torch.rand_like(params))
b_temp = nn.Parameter(torch.rand_like(params))
b.append(b_temp.detach().clone())
B = nn.ParameterList(b)

modelwithAB = Net(pre_model, B)
# ...
# in the training iteration
out = modelwithAB(image, A)
``````

Hi @ptrblck , from another discussion I read that you suggest using `no_grad()` context when modifying model parameters. But, in my case, here I didn’t use it because I want B to be updated by the optimizer through `add_` operation.

I tried to use `no_grad()`, and it made the training time more stable (not slowing), but as I expected, it made B not update. I am not sure which one is close to the solution.

I don’t know how exactly your model is used and where the computation graph might be stored as I cannot reproduce any increase in memory usage using:

``````
device = 'cuda'
pre_model = models.resnet18().to(device)
b = []
A = []
for params in list(pre_model.parameters()):
A.append(torch.rand_like(params))
b_temp = nn.Parameter(torch.rand_like(params))
b.append(b_temp.detach().clone())
B = nn.ParameterList(b)

modelwithAB = Net(pre_model, B)

image = torch.randn(2, 3, 224, 224).to(device)
print(torch.cuda.memory_allocated()/1024**2)

for _ in range(10):
out = modelwithAB(image, A)
out.mean().backward()
optimizer.step()
print(torch.cuda.memory_allocated()/1024**2)
``````

The `print` statement shows an approx. constant memory usage, which looks correct.

Thank you, @ptrblck, for trying to reproduce this error.

I have modified your code by adding `tqdm` and logging the time for `backward()` and `step()`, and even though the memory usage is still the same, you can see that the training time is slowing down.

``````for i in tqdm(range(300)): # longer run, added tqdm
out = modelwithAB(image, A)
start = time.time() # for logging the backpropagation's duration
out.mean().backward()
optimizer.step()
if i%40==0:
print("-", torch.cuda.memory_allocated()/1024**2, "-", time.time()-start)
``````

``````  1%|          | 3/300 [00:00<00:10, 28.02it/s] - 1457.7119140625 - 0.02620530128479004
14%|█▍        | 42/300 [00:02<00:18, 14.10it/s] - 1365.5771484375 - 0.06569838523864746
27%|██▋       | 82/300 [00:06<00:29,  7.47it/s] - 1365.5693359375 - 0.12723588943481445
41%|████      | 122/300 [00:13<00:33,  5.37it/s] - 1365.5693359375 - 0.17061519622802734
54%|█████▎    | 161/300 [00:21<00:33,  4.10it/s] - 1365.5693359375 - 0.23227190971374512
67%|██████▋   | 201/300 [00:32<00:30,  3.30it/s] - 1365.5693359375 - 0.29410719871520996
80%|████████  | 241/300 [00:46<00:20,  2.82it/s] - 1365.5693359375 - 0.3430461883544922
94%|█████████▎| 281/300 [01:02<00:07,  2.41it/s] - 1365.5693359375 - 0.40257716178894043
100%|██████████| 300/300 [01:10<00:00,  4.26it/s]
``````

I have tried to remove `tqdm`, but the training time is still slowing down.

Hi @ptrblck I am sorry keep mentioning you, but can you help me check the code above I still got stuck in this problem, and I don’t have any clue how to track/debug in this situation