I have two parameters, A and B, that I need to put to replace all the weight of the pre-trained model. So I want to utilize the forward calculation of the model but not the weight.
I want to modify the weight of the model W = A + B, where A is a fixed tensor (not trainable), but B is a trainable parameter. So, in the end, my aim is to train B in the structure of the pre-trained model.
This is my attempt:
class Net(nn.Module):
def __init__(self, pre_model, B):
super(Net, self).__init__()
self.B = B
with torch.no_grad():
self.model = copy.deepcopy(pre_model)
for params in self.model.parameters():
params.requires_grad = False
def forward(self, x, A):
for i, params in enumerate(self.model.parameters()):
params.copy_(A[i].detach().clone())
params.add_(self.B[i])
params.retain_grad()
x = self.model(x)
return x
I checked in the process, B was already trained. But the problem is in every iteration the training process keeps getting slower:
I think I have detached all the parameters correctly, but I am not sure why it’s happened. I hope somebody here can help me to find out what exactly happened here.
Thanks
Based on your code snippet you are detaching A, which is the fixed tensor, while you are adding B to params potentially including its entire computation graph. Could you double check this, please?
Basically, this is what I did to get A and B (it’s a little bit different, but I’ve tested the code below, and it’s still giving slowing iteration):
b = []
A = []
for params in list(pre_model.parameters()):
A.append(torch.rand_like(params))
b_temp = nn.Parameter(torch.rand_like(params))
b.append(b_temp.detach().clone())
B = nn.ParameterList(b)
modelwithAB = Net(pre_model, B)
# ...
# in the training iteration
out = modelwithAB(image, A)
Hi @ptrblck , from another discussion I read that you suggest using no_grad() context when modifying model parameters. But, in my case, here I didn’t use it because I want B to be updated by the optimizer through add_ operation.
I tried to use no_grad(), and it made the training time more stable (not slowing), but as I expected, it made B not update. I am not sure which one is close to the solution.
Thank you, @ptrblck, for trying to reproduce this error.
I have modified your code by adding tqdm and logging the time for backward() and step(), and even though the memory usage is still the same, you can see that the training time is slowing down.
for i in tqdm(range(300)): # longer run, added tqdm
optimizer.zero_grad()
out = modelwithAB(image, A)
start = time.time() # for logging the backpropagation's duration
out.mean().backward()
optimizer.step()
if i%40==0:
print("-", torch.cuda.memory_allocated()/1024**2, "-", time.time()-start)
Hi @ptrblck I am sorry keep mentioning you, but can you help me check the code above I still got stuck in this problem, and I don’t have any clue how to track/debug in this situation