Hi,
I am trying implement different learning rate across my network .
I am creating the parameter groups as follows:
Simple optimizer:
optimizer = optim.SGD(net.parameters(), lr=learning_rate )
Optimizer with parameter groups:
optimizer1 = optim.SGD([
{'params': net.top_model[0:10].parameters(), 'lr': learning_rate/10, 'momentum': 0},
{'params': net.top_model[10:31].parameters(), 'lr ':learning_rate/3 },
{'params': net.linear1.parameters()},
{'params': net.bn1.parameters()},
], lr=learning_rate )
When I do
len(optimizer.param_groups[0]['params']) # I get 30
len(optimizer1.param_groups[0]['params']) # I get 8
I don’t understand how Pytorch arrives at these numbers.
Could someone clarify if this is the right way to do and explain the difference in param_group numbers?
Thanks a ton!
My network:
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
vgg = models.vgg16(pretrained=Pretrained)
layers = list(vgg.children())[0][:31]
self.top_model = nn.Sequential(*layers).cuda()
self.bn1 = nn.BatchNorm1d(512)
self.linear1 = nn.Linear(512,10)
def forward(self,x):
x = F.relu(self.top_model(x))
x = nn.AdaptiveAvgPool2d((1,1))(x)
x = x.view(x.shape[0],-1)
x = self.bn1(x)
x = self.linear1(x)
return x