What is the proper way to set different learning rates for different dense blocks of densenet? I am confused because these are nested under
(features)
section inside densenet:
The hierarchy is as following:
DenseNet(
(features): Sequential(
(conv0): Conv2d (3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
(norm0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
(relu0): ReLU(inplace)
(pool0): MaxPool2d(kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), dilation=(1, 1))
(denseblock1): _DenseBlock(),
(transition1): _Transition(),
....
(norm5)
)
(classifier):
)
Should I send this to the optimizer?
[
{'params': densenet.features.conv0},
{other layers},
{'params': densenet.features.denseblock1},
{'params': densenet.features.transition1},
....,
{'params': densenet.classifier}]