How to change the weights of a pytorch model?

I need to change the weights at specific layers of ResNet-152 during training.
I think there has been a similar question sometime earlier, but I cannot find it!

1 Like

Would you like to change the weights manually?
If so, you could wrap the code in a torch.no_grad() guard:

with torch.no_grad():
    model.fc.weight[0, 0] = 1.

to prevent Autograd from tracking these changes.

6 Likes

Thank you @ptrblck!
In fact, I am trying an attention mechanism and I want the weights that I update, according to attention, to take effect during training. Thus, from my understanding, torch.no_grad() should only be used during testing/validation. Here’s the code I have in mind (for a simple demonstration, I am replacing the attention weights here with some random values):

    model.train(); total_loss = 0
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device) 
        optimizer.zero_grad()
        attention_matrix = torch.rand(model.fc.weight.size())
        model.fc.weight = model.fc.weight * attention_matrix
        output = model(data)           
        loss = criterion(output, target)
        total_loss += loss.item()
        loss.backward()
        optimizer.step()

During testing, however, I think we need to use torch.no_grad() , as follows:

    model.eval()  
    with torch.no_grad():
        for data, target  in test_loader:
            data, target = data.to(device), target.to(device)
            model.fc.weight = model.fc.weight * attention_matrix
            output = model(data)
            loss = criterion(output, target)
            total_loss += loss.item()

I also need to do the same for the convolution layers, not yet sure how this could be done!

I’m not sure if you should manipulate the weights directly using the attention weights.
In the Seq2Seq tutorial the attention weights are multiplied with the encoder outputs to calculate new activations.
Would that work for you, too?

I agree with you that it is a bit critical to do this by manipulating the weights directly using attention, and the network thus might not converge. But, I intend to do this using simulated annealing, hopefully it is gonna work; I will not know unless I try it.

I still do not know how to change the parameters of the convolution layers.

1 Like

To make it work, I needed to cast the output of the multiplication, as follows:

model.fc.weight = torch.nn.parameter.Parameter( model.fc.weight * attention_matrix )

2 Likes

@Deeply Just to make sure, so you were able to manipulate the weights manually by using the above code? No harm to model’s performance?
I am asking because I want to use your approach but for conv layer. Thanks.

Yes. It worked.
As far as I remember, the model was very tolerant to the introduced noise and was able to adapt and recover very quickly. After being hit by the first noise multiply, the MSE (or whatever metric used) was reduced a bit, but not drastically. Then, it started to recover and tolerate the noise. I was trying to generate different models at the mid-epoch training journey. Then, by having a noise generator, I aimed at building a high-level ensemble by combining all the models.

Training Configuration:
I don’t remember the exact configuration; but I can give an example about what I had in mine: suppose my training goes up to 100 epochs, the model learns normally and after epoch 50, it is hit with this noise at every 10 epochs {50, 60, 70, 80, 90} and it stops at epoch 100. Something like that.

No harm to model’s performance?
The Model performance depends on what you are trying to achieve, and how you are manipulating the weights.

1 Like

@Deeply Thanks for your confirmation.

Greetings,
I apologize for reviving this topic, it’s so close to my needs.
I want to change weights according to meta-information supplied with input images and I need intentionally to track these changes with Autograd.
I wonder if not using torch.no_grad() is enough for it, so if I don’t use anything can I be sure that the results will be backpropagated in the usual way, and the manual alteration is compatible with “normal” gradient flow (when weights are within nn module).

I’m not sure I understand what you are trying to achieve.
Yet, IMHO, the best way to make sure that autograd does what it is supposed to, is to run the code in debug mode, track changes and see what has been updated via BackProbagation. I usually follow this approach to be confident about what is happening. You can run this debugging-test using a simply toy-example that mimics what you are trying to do.

2 Likes

This doesn’t seem to be the right way as in my case when I updated weights this way, my gradients became non-zero even after using the optimizer.zero_grad().

For calculating features with updated weight, I used torch.nn.functional as we have conv layer already initialized in init keeping new weights in a separate variable.

In my case, I wanted the weights to be updated after hitting them with noise. Yet, you can use torch.no_grad() to prevent the gradient from updating your weights. You can also use detach() method, which constructs a new view on a tensor which is declared not to need gradients , i.e., it is to be excluded from further tracking of operations, and therefore the subgraph involving this view will not be recorded.