Set nn.Parameter during training

Hello,

I would like to set an nn.Parameter during training. What is the right way to do so, without
breaking the connection to the optimizer (i.e. creating a copy of the tensor).

I tried to use torch.where but even if I use an nn.Parameter(torch.tensor(1.)) as replacement, it throws an errror.

TypeError: cannot assign 'torch.FloatTensor' as parameter 'weight' (torch.nn.Parameter or None expected)

I’m unsure if I understand your use case correctly but in case you want to directly manipulate a trainable parameter (i.e. without calculating the gradients and using an optimizer), you could use a no_grad context as seen here:

model = nn.Linear(10, 10)
optimizer = torch.optim.Adam(model.parameters(), lr=1.)

out = model(torch.randn(1, 10))
out.mean().backward()

print(model.weight.abs().sum())
# tensor(14.3759, grad_fn=<SumBackward0>)
optimizer.step()
print(model.weight.abs().sum())
# tensor(97.4504, grad_fn=<SumBackward0>)

model.zero_grad()
with torch.no_grad():
    model.weight.copy_(torch.ones_like(model.weight))
print(model.weight)

# make sure model is still updated
out = model(torch.randn(1, 10))
out.mean().backward()

print(model.weight.abs().sum())
# tensor(100., grad_fn=<SumBackward0>)
optimizer.step()
print(model.weight.abs().sum())
# tensor(100.3176, grad_fn=<SumBackward0>)
class TinyModel(torch.nn.Module):

    def __init__(self):
        super(TinyModel, self).__init__()

        self.layer1 = torch.nn.Linear(1000, 100)
        self.relu = torch.nn.ReLU()
        self.layer2 = torch.nn.Linear(100, 10)

    def forward(self, x):
        with torch.no_grad():
            addition = torch.ones_like(model.weight)
            self.layer1.weight.copy_(self.layer1.weight + addition)

        x = self.layer1(x)
        x = self.relu(x)
        x = self.layer2(x)
        return x

So I could use the copy operation without running into troubles that the new weights are not registered by Adam under any circumstances (as long as I keep the shape of model.weights)?

Hi, I did a similar operation, but it seems that if the optimizer is not processed, the parameters will fall back to the previous values.