Hello everyone,
I am trying to train a model that uses parameters that are a sum of two sets weights and I want to to train only one set of weights. That is, for every weight w = w_1 + w_2, where w_2 is a constant and w_1 is to be optimized using an optimizer. Thus, in the forward step w is used to compute the loss, but in the backward step only w_1 is adjusted. Is this possible?
It is possible.
Just make w1.requires_grad = True ~~ w1.requires_grad_(True)
and w2.requires_grad_(False)
.
Look at the example below.
import torch
class Model(torch.nn.Module) :
def __init__(self) :
super().__init__()
self.w1 = torch.nn.parameter.Parameter(data = torch.tensor(1.), requires_grad = True)
self.w2 = torch.nn.parameter.Parameter(data = torch.tensor(2.), requires_grad = False)
def forward(self, x) :
w = self.w1 + self.w2
return w * x
def mae(y, y_pred) :
"""mean absolute error"""
return (y - y_pred).abs().mean()
# data
x = torch.tensor([1., 3, 3])
y = 2 * x
# model
model = Model()
# model.w1.grad and model.w2.grad are None
y_pred = model(x) # tensor([3., 9., 9.], grad_fn=<MulBackward0>)
loss = mae(y_pred, y) # tensor(2.3333, grad_fn=<MeanBackward0>
loss.backward()
# model.w1.grad is equal to tensor(2.3333) and model.w2.grad is always None (or 0 if you zero the gradients)
In this case, optimizer.step()
will update the parameters. For example : w1 = w1 - learning_rate*w1.grad != w1
and w2 = w2 - learning_rate*w2.grad = w2 - 0 = w2
(In fact, optimizing it won’t even do that for w2, since w2 will be seen in the computation graph as a constant)
I understand, thank you for the example!