How to freeze weights and update random vector?

vishalthengane · October 12, 2020, 6:35am

Hi,

I have a trained model weights, and I want to replace those weights with weights + some_random vector, and update only the random vector not weights of a model (freeze those weights).

pseudo code;

# some_random_tensor with requires_grad==True
for layer in model_layers:
    layer.weight = nn.Parameter(layer.weight.data + some_random_tensor)

So now during training, I only want to update some_random_tensor not model weights. How I can approach this problem?

Thanks!

ptrblck · October 12, 2020, 8:47am

On approach would be to freeze all parameters in the original layer and create some_random_tensor as a new nn.Parameter.
The original module and the new parameter could be initialized in a custom nn.Module and in its forward method you could use the functional API (e.g. via F.linear) and apply your operation.

vishalthengane · October 12, 2020, 9:39am

Hi @ptrblck,

can you help me with the code snippet on custom nn.Module?

Below I try to use nn.Linear but the weights are still updating.

# eign_vectors == it is a tensor contains eign value for each layer.
for i, layer in enumerate(layers):
    alpha = torch.nn.Linear(shape[0], shape[1], bias=False)
    addn_to_weights = torch.matmul(alpha.weight, eign_vectors[i])
    model.layers[i].weight = model.layers[i].weight = torch.nn.Parameter(
                    torch.add(model.layers[i].weight.clone.detach(), addn_to_weights)
                )

# here i don't want to update weights, Only update the alpha value in a next training process.

Thanks!

ptrblck · October 12, 2020, 8:34pm

Here is a code snippet of my idea:

# Setup
lin = nn.Linear(5, 5, bias=False)
weight = nn.Parameter(torch.randn(5, 5))
optimizer = torch.optim.SGD([weight], lr=1.)

# Freeze lin.weight
lin.weight.requires_grad = False

# Forward pass
x = torch.randn(1, 5)
out = F.linear(x, lin.weight + weight)

# Backward pass
out.mean().backward()

# Check grads
print(lin.weight.grad)
> None

print(weight.grad)
> tensor([[-0.0892, -0.3502, -0.0987, -0.3553, -0.2443],
          [-0.0892, -0.3502, -0.0987, -0.3553, -0.2443],
          [-0.0892, -0.3502, -0.0987, -0.3553, -0.2443],
          [-0.0892, -0.3502, -0.0987, -0.3553, -0.2443],
          [-0.0892, -0.3502, -0.0987, -0.3553, -0.2443]])

# Step
optimizer.step()

Let me know, if this would work for you.

vishalthengane · October 13, 2020, 4:34am

Thank you @ptrblck,

I modified the code for my use case but the alpha variable in the following code not have a gradient, what am I missing?

import torch
import torch.nn as nn
import torch.nn.functional  as F

# Setup
lin = nn.Linear(5, 5, bias=False)
alpha = nn.Parameter(torch.randn(5, 2))
eign_vect = torch.rand(2, 5)
weight = nn.Parameter(torch.matmul(alpha, eign_vect))
optimizer = torch.optim.SGD([weight, alpha], lr=1.)

# Freeze lin.weight
lin.weight.requires_grad = False

# Forward pass
x = torch.randn(1, 5)
out = F.linear(x, lin.weight + weight)
# Backward pass
out.mean().backward()

# Check grads
print(lin.weight.grad)
# > None
print(alpha.grad)
# > None
print(weight.grad)
# > tensor([[-0.0892, -0.3502, -0.0987, -0.3553, -0.2443],
#           [-0.0892, -0.3502, -0.0987, -0.3553, -0.2443],
#           [-0.0892, -0.3502, -0.0987, -0.3553, -0.2443],
#           [-0.0892, -0.3502, -0.0987, -0.3553, -0.2443],
#           [-0.0892, -0.3502, -0.0987, -0.3553, -0.2443]])

# Step
optimizer.step()

And I only want to update the alpha variable (not weight), How I can do it?

ptrblck · October 13, 2020, 6:30am

alpha is not used in the computation graph, but instead is used to create the weight parameter.
If you want to update alpha and keep weight constant, create weight as a tensor and use alpha during the forward pass.

vishalthengane · October 13, 2020, 6:32am

How do I use alpha in the forward pass, I am a bit confused.

ptrblck · October 13, 2020, 8:02am

This should work:

lin = nn.Linear(5, 5, bias=False)
alpha = nn.Parameter(torch.randn(5, 2))
eign_vect = torch.rand(2, 5)
#weight = nn.Parameter(torch.matmul(alpha, eign_vect))
optimizer = torch.optim.SGD([alpha], lr=1.)

# Freeze lin.weight
lin.weight.requires_grad = False

# Forward pass
x = torch.randn(1, 5)
weight = torch.matmul(alpha, eign_vect)
out = F.linear(x, lin.weight + weight)

# Backward pass
out.mean().backward()

# Check grads
print(lin.weight.grad)
print(alpha.grad)
print(weight.grad)

This approach would calculate the gradients for alpha and would recompute weight using the new and updated alpha value.

vishalthengane · October 14, 2020, 4:48am

Thank you @ptrblck,

your implemetation is working, but when I put F.linear() inside the for loop like below;

def my_model(x):
    for i, old_param in enumerate(model.parameters()):
        x = F.linear(x, old_param + new_weight[i])

    return x

I am getting an error;


RuntimeError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed. Specify retain_graph=True when calling backward the first time.

When I do loss.backward(retain_graph=True) the following error is commig in:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [10, 399]] is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

Here is my full implementation:


alphas = []
new_weights = []
for i, param in enumerate(model.parameters()):
    eign_shape = params_dicts[f"layer_{i}"]["eign_vector"].shape
    layer_shape = param.shape
    alpha = nn.Parameter(torch.randn(layer_shape[0], eign_shape[0]))
    alphas.append(alpha)
    new_weights.append(torch.matmul(alpha, params_dicts[f"layer_{i}"]["eign_vector"]))

def new_model(x): # x represents batch images.
    
    for i, old_param in enumerate(model.parameters()):
        x = func.linear(x, torch.add(old_param, new_weights[i]))
    
    return x


for epoch in range(num_epochs):
    for batch, (images, labels) in enumerate(train_loader):
        # print(batch)
        images, labels = images.to(device), labels.to(device)

        optimizer.zero_grad()
        with torch.set_grad_enabled(True):
            outputs = new_model(images)
            loss = criterion(outputs, labels)
        
            if epoch == (num_epochs-1) and batch == (train_loader):
                loss.backward()
            else:
                loss.backward(retain_graph=True)
            optimizer.step()

ptrblck · October 14, 2020, 6:54am

Could you post an executable code snippet, please?
Currently most of the objects are undefined so that I could only speculate what to use for them.

vishalthengane · October 14, 2020, 8:05am

Hi @ptrblck thank you for the reply. I solved it by moving this line torch.matmul(alpha, params_dicts[f"layer_{i}"]["eign_vector"]) inside the for loop. so the update code is;

def new_model(x): # x represents batch images.
    # print(x.shape)
    for i, old_param in enumerate(model.parameters()):
        weight = torch.matmul(alphas[i], params_dicts[f"layer_{i}"]["eign_vector"])
        x = func.linear(x, old_param + weight)
    
    return x