Hello,
I would like to set an nn.Parameter
during training. What is the right way to do so, without
breaking the connection to the optimizer (i.e. creating a copy of the tensor).
I tried to use torch.where
but even if I use an nn.Parameter(torch.tensor(1.))
as replacement, it throws an errror.
TypeError: cannot assign 'torch.FloatTensor' as parameter 'weight' (torch.nn.Parameter or None expected)
I’m unsure if I understand your use case correctly but in case you want to directly manipulate a trainable parameter (i.e. without calculating the gradients and using an optimizer), you could use a no_grad
context as seen here:
model = nn.Linear(10, 10)
optimizer = torch.optim.Adam(model.parameters(), lr=1.)
out = model(torch.randn(1, 10))
out.mean().backward()
print(model.weight.abs().sum())
# tensor(14.3759, grad_fn=<SumBackward0>)
optimizer.step()
print(model.weight.abs().sum())
# tensor(97.4504, grad_fn=<SumBackward0>)
model.zero_grad()
with torch.no_grad():
model.weight.copy_(torch.ones_like(model.weight))
print(model.weight)
# make sure model is still updated
out = model(torch.randn(1, 10))
out.mean().backward()
print(model.weight.abs().sum())
# tensor(100., grad_fn=<SumBackward0>)
optimizer.step()
print(model.weight.abs().sum())
# tensor(100.3176, grad_fn=<SumBackward0>)
class TinyModel(torch.nn.Module):
def __init__(self):
super(TinyModel, self).__init__()
self.layer1 = torch.nn.Linear(1000, 100)
self.relu = torch.nn.ReLU()
self.layer2 = torch.nn.Linear(100, 10)
def forward(self, x):
with torch.no_grad():
addition = torch.ones_like(model.weight)
self.layer1.weight.copy_(self.layer1.weight + addition)
x = self.layer1(x)
x = self.relu(x)
x = self.layer2(x)
return x
So I could use the copy
operation without running into troubles that the new weights are not registered by Adam under any circumstances (as long as I keep the shape of model.weights)?