SGD: specifying lr in SGD constructor vs. per-param -- why different behavior?

Hi,

The following two pieces of code produce different training results and I don’t understand why. My understanding is that named_parameters() returns all model params (trainable or not). What is the difference between specifying the learning rate on each of them vs. in the SGD constructor?

Code #1:

def get_optimizer(self):
        lr = opt.lr 
        params = []
        for key, value in dict(self.named_parameters()).items():
            if value.requires_grad:
                if 'bias' in key:
                    params += [{'params': [value], 'lr': lr * 1, 'weight_decay': 0}]
                else:
                    params += [{'params': [value], 'lr': lr, 'weight_decay': 0}]
        self.optimizer = t.optim.SGD(params, momentum=0.9)
        return self.optimizer

Code #2:

def get_optimizer(self):
        self.optimizer = t.optim.SGD(self.parameters(), lr = opt.lr, momentum=0.9)
        return self.optimizer

parameters() and named_parameters() have the same items.

Thank you,

Bart

Both approaches work identical as seen here:

def get_optimizerA(model):
    lr = 1.
    params = []
    for key, value in model.named_parameters():
        if value.requires_grad:
            params += [{'params': [value], 'lr': lr, 'weight_decay': 0}]
    optimizer = torch.optim.SGD(params, momentum=0.9)
    return optimizer

def get_optimizerB(model):
    lr = 1.
    optimizer = torch.optim.SGD(model.parameters(), lr=lr, momentum=0.9)
    return optimizer

# setup
modelA = models.resnet18().eval()
modelB = models.resnet18().eval()
modelB.load_state_dict(modelA.state_dict())
x = torch.randn(2, 3, 224, 224)

optimizerA = get_optimizerA(modelA)
optimizerB = get_optimizerB(modelB)

outA = modelA(x)
outB = modelB(x)

lossA = outA.mean()
lossB = outB.mean()
print((lossA - lossB).abs())

lossA.backward()
lossB.backward()

optimizerA.step()
optimizerB.step()

for (nameA, paramA), (nameB, paramB) in zip(modelA.named_parameters(), modelB.named_parameters()):
    if not nameA == nameB:
        print("Error in iterating param dicts")
        break
    print('{}: abs.max error {}'.format(nameA, (paramA - paramB).abs().max()))

Could you post an executable code snippet, which would reproduce the issue you are seeing?

Thanks – you are correct, they are identical. My apologies. I’m not sure how I got it into that state but I can no longer reproduce the issue.