Given the pseudo model below:
class model(nn.Module):
def __init__(self):
self.alpha_params1 = nn.Parameters(<size>, requires_grad=True)
self.alpha_params2 = nn.Parameters(<size>, requires_grad=True)
< typical Conv2d layers>....
def forward(self, x):
<feed forward>
return output
def net_parameters(self, filtered_name='alpha_', recurse=True):
# return torch layer params
for name, param in self.named_parameters(recurse=recurse):
if filtered_name not in name:
yield param
def extra_params(self):
# return user defined params
return [self.alpha_params1, self.alpha_params2]
So above is my pseudo model code.
net = model() # instantiate model above
optimizer_1 = Adam(model.net_parameters, lr=0.001,...)
optimizer_2 = Adam(model.extra_params, lr=0.003,...)
criterion = L1()
##
# typical Apex Distributed Data Parrallel initialization
##
for epoch in epochs:
for data1, data2 in dataloader:
output = net(data1.data)
loss = criterion(output, data1.gt)
loss.backward()
optimizer_1.zero_grad()
optimizer_1.step()
output2 = net(data2.data)
loss2 = criterion(output2, data2.gt)
loss2.backward()
optimizer_2.zero_grad()
optimizer_2.step()
# save checkpoint
torch.save(net.module.state_dict(), f"state_dict_{epoch}.pth")
torch.save(net.module.extra_params(), f"extra_params_{epoch}.pth")
Above is my pseudo code for model instantiation and training.
At every 10 epoch intervals, I checkpoint by saving model.state_dict() as well as model’s alpha parameters separately. I then compare the value of my alpha parameters between different epochs. What I found is that both parameters from separate epochs are identical in values as well the model’s weights. It seems no update is happening. Any help is appreciated.