I’m not sure if your manual param_groups
manipulation works and would recommend to use add_param_group
instead:
model.l1.register_parameter('t',torch.nn.Parameter(torch.Tensor([0.05]).to(device)))
opt.add_param_group({'params': [model.l1.t]})
Afterwards I get this output:
Train Epoch: 0 [576/60000 (1%)] Loss: -0.093652
Train Epoch: 1 [576/60000 (1%)] Loss: -0.130155
Adadelta (
Parameter Group 0
eps: 1e-06
initial_lr: 0.001
lr: 0.00049
rho: 0.9
weight_decay: 0
Parameter Group 1
eps: 1e-06
lr: 0.001
rho: 0.9
weight_decay: 0
)
Parameter containing:
tensor([0.0500], device='cuda:0', requires_grad=True) *grad* None
Parameter containing:
tensor([0.0500], device='cuda:0', requires_grad=True) *grad* tensor([-0.1296], device='cuda:0')
Parameter containing:
tensor([0.0500], device='cuda:0', requires_grad=True) *grad* tensor([-0.1662], device='cuda:0')
Parameter containing:
tensor([0.0500], device='cuda:0', requires_grad=True) *grad* tensor([-0.0712], device='cuda:0')
Parameter containing:
tensor([0.0500], device='cuda:0', requires_grad=True) *grad* tensor([-0.1003], device='cuda:0')
Parameter containing:
tensor([0.0500], device='cuda:0', requires_grad=True) *grad* tensor([-0.0827], device='cuda:0')
Parameter containing:
tensor([0.0500], device='cuda:0', requires_grad=True) *grad* tensor([-0.0788], device='cuda:0')
Parameter containing:
tensor([0.0500], device='cuda:0', requires_grad=True) *grad* tensor([-0.1489], device='cuda:0')
Parameter containing:
tensor([0.0500], device='cuda:0', requires_grad=True) *grad* tensor([-0.1556], device='cuda:0')
Parameter containing:
tensor([0.0500], device='cuda:0', requires_grad=True) *grad* tensor([-0.1602], device='cuda:0')
Train Epoch: 2 [576/60000 (1%)] Loss: -0.007945
Parameter containing:
tensor([0.0500], device='cuda:0', requires_grad=True) *grad* tensor([-0.1588], device='cuda:0')
Parameter containing:
tensor([0.0500], device='cuda:0', requires_grad=True) *grad* tensor([-0.0828], device='cuda:0')
Parameter containing:
tensor([0.0500], device='cuda:0', requires_grad=True) *grad* tensor([-0.1081], device='cuda:0')
Parameter containing:
tensor([0.0500], device='cuda:0', requires_grad=True) *grad* tensor([-0.1140], device='cuda:0')
Parameter containing:
tensor([0.0500], device='cuda:0', requires_grad=True) *grad* tensor([-0.1470], device='cuda:0')
Parameter containing:
tensor([0.0500], device='cuda:0', requires_grad=True) *grad* tensor([-0.1953], device='cuda:0')
Parameter containing:
tensor([0.0500], device='cuda:0', requires_grad=True) *grad* tensor([-0.1221], device='cuda:0')
Parameter containing:
tensor([0.0500], device='cuda:0', requires_grad=True) *grad* tensor([-0.1119], device='cuda:0')
Parameter containing:
tensor([0.0500], device='cuda:0', requires_grad=True) *grad* tensor([-0.0748], device='cuda:0')
Parameter containing:
tensor([0.0501], device='cuda:0', requires_grad=True) *grad* tensor([-0.1101], device='cuda:0')
Train Epoch: 3 [576/60000 (1%)] Loss: -0.008487
which shows that the new parameter is updated.