I have a trained model weights, and I want to replace those weights with weights + some_random vector, and update only the random vector not weights of a model (freeze those weights).
pseudo code;
# some_random_tensor with requires_grad==True
for layer in model_layers:
layer.weight = nn.Parameter(layer.weight.data + some_random_tensor)
So now during training, I only want to update some_random_tensor not model weights. How I can approach this problem?
On approach would be to freeze all parameters in the original layer and create some_random_tensor as a new nn.Parameter.
The original module and the new parameter could be initialized in a custom nn.Module and in its forward method you could use the functional API (e.g. via F.linear) and apply your operation.
can you help me with the code snippet on custom nn.Module?
Below I try to use nn.Linear but the weights are still updating.
# eign_vectors == it is a tensor contains eign value for each layer.
for i, layer in enumerate(layers):
alpha = torch.nn.Linear(shape[0], shape[1], bias=False)
addn_to_weights = torch.matmul(alpha.weight, eign_vectors[i])
model.layers[i].weight = model.layers[i].weight = torch.nn.Parameter(
torch.add(model.layers[i].weight.clone.detach(), addn_to_weights)
)
# here i don't want to update weights, Only update the alpha value in a next training process.
alpha is not used in the computation graph, but instead is used to create the weight parameter.
If you want to update alpha and keep weight constant, create weight as a tensor and use alpha during the forward pass.
your implemetation is working, but when I put F.linear() inside the for loop like below;
def my_model(x):
for i, old_param in enumerate(model.parameters()):
x = F.linear(x, old_param + new_weight[i])
return x
I am getting an error;
RuntimeError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed. Specify retain_graph=True when calling backward the first time.
When I do loss.backward(retain_graph=True) the following error is commig in:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [10, 399]] is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
Here is my full implementation:
alphas = []
new_weights = []
for i, param in enumerate(model.parameters()):
eign_shape = params_dicts[f"layer_{i}"]["eign_vector"].shape
layer_shape = param.shape
alpha = nn.Parameter(torch.randn(layer_shape[0], eign_shape[0]))
alphas.append(alpha)
new_weights.append(torch.matmul(alpha, params_dicts[f"layer_{i}"]["eign_vector"]))
def new_model(x): # x represents batch images.
for i, old_param in enumerate(model.parameters()):
x = func.linear(x, torch.add(old_param, new_weights[i]))
return x
for epoch in range(num_epochs):
for batch, (images, labels) in enumerate(train_loader):
# print(batch)
images, labels = images.to(device), labels.to(device)
optimizer.zero_grad()
with torch.set_grad_enabled(True):
outputs = new_model(images)
loss = criterion(outputs, labels)
if epoch == (num_epochs-1) and batch == (train_loader):
loss.backward()
else:
loss.backward(retain_graph=True)
optimizer.step()
Hi @ptrblck thank you for the reply. I solved it by moving this line torch.matmul(alpha, params_dicts[f"layer_{i}"]["eign_vector"]) inside the for loop. so the update code is;
def new_model(x): # x represents batch images.
# print(x.shape)
for i, old_param in enumerate(model.parameters()):
weight = torch.matmul(alphas[i], params_dicts[f"layer_{i}"]["eign_vector"])
x = func.linear(x, old_param + weight)
return x