Wrapping registered nn.Parameter in another tensor doesn't train given parameter

domst · January 18, 2020, 5:06pm

Hey I’m initializing a trainable parameter and adding it to the optimizer like so:

lamb = nn.Parameter(torch.tensor(0.0, requires_grad=True, device=device, dtype=torch.float32))
params = [
    {'params': net.parameters(), 'lr': 1e-3},
    {'params': lamb, 'lr': 1e-3}
]

optimizer =  Adam(params)

This parameter i wanna use as a trainable weight for some matrix-vector multiplications in the following, where only lamb should be learned. That’s why i wrap lamb into another tensor.

xi = torch.tensor([[lamb], [-1], [1]], requires_grad=True, device=device, dtype=torch.float32)

But if I use that for training my parameter lamb doens’t get updated at all even though my optimizer is fully aware of this parameter. Am I missing something or is there a better way to do stuff like this?

Thanks in advance!

ptrblck · January 20, 2020, 12:49am

Recreating a tensor might break the computation graph, so you could use torch.stack instead, which should yield a valid gradient in lamb:

lamb = nn.Parameter(torch.tensor(0.0, requires_grad=True, device=device, dtype=torch.float32))
a = torch.tensor([-1], requires_grad=True, device=device, dtype=torch.float32)
b = torch.tensor([1], requires_grad=True, device=device, dtype=torch.float32)

xi = torch.stack((lamb.unsqueeze(0), a, b))
out = xi * torch.randn(1)
out.mean().backward()

domst · January 20, 2020, 1:43pm

I tried implementing it this way, but when printing xi it shows that lamb still isn’t optimized for…

It: 0/200000, Loss: 1.04499, xi: [0.][-1.][1.]				 
It: 1/200000, Loss: 0.84020, xi: [0.][-1.][1.]				 
It: 2/200000, Loss: 0.66697, xi: [0.][-1.][1.]				 
It: 3/200000, Loss: 0.52373, xi: [0.][-1.][1.]				 
It: 4/200000, Loss: 0.40832, xi: [0.][-1.][1.]				 
It: 5/200000, Loss: 0.31867, xi: [0.][-1.][1.]				 
It: 6/200000, Loss: 0.25264, xi: [0.][-1.][1.]				 
It: 7/200000, Loss: 0.20781, xi: [0.][-1.][1.]				 
It: 8/200000, Loss: 0.18126, xi: [0.][-1.][1.]				 
It: 9/200000, Loss: 0.16949, xi: [0.][-1.][1.]				 
It: 10/200000, Loss: 0.16853, xi: [0.][-1.][1.]				 
It: 11/200000, Loss: 0.17435, xi: [0.][-1.][1.]				 
It: 12/200000, Loss: 0.18329, xi: [0.][-1.][1.]				 
It: 13/200000, Loss: 0.19237, xi: [0.][-1.][1.]				 
It: 14/200000, Loss: 0.19953, xi: [0.][-1.][1.]
...

I got a hackaround by optimizing for the whole vector xi and manually setting the gradients of the other components to 0 after loss.backward() like so:

xi = nn.Parameter(torch.tensor([[0], [-1], [1]], requires_grad=True, device=device, dtype=torch.float32))
params = [
    {'params': net.parameters(), 'lr': 1e-3},
    {'params': xi, 'lr': 1e-3}
]

optimizer = Adam(params)
while(training):
    # some stuff and computing loss...
    loss.backward()

    xi.grad[1:3] = 0
    optimizer.step()

But this doesn’t really feel right…

ptrblck · January 20, 2020, 10:27pm

My code snippet should yield valid gradients for all tensors in xi.
Based on your initial code snippet it seems you want to get gradients for all subtensors, as you were initializing it with requires_grad=True.
If you want to keep a portion of a tensor constant without updating it, your new approach seems to be the right way to do so.