Hi All
I am using below code to build in param groups depending upon the break points given
say i have 5 seq layers in a model and these seq in turn have sub children. as shown below.
. After I built the groups i pass it on to optimizer ,problem is my accuracy is getting too bad compared to having just one group by default which is having all parameters of the model.
Can some one let me know if m doing groupings correctly which can logically work
Model details
Sequential(
(0): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): GeneralRelu()
(2): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
torch.Size([128, 16, 128, 128])
Sequential(
(0): Conv2d(16, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(1): GeneralRelu()
(2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
torch.Size([128, 32, 64, 64])
Sequential(
(0): Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(1): GeneralRelu()
(2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
torch.Size([128, 64, 32, 32])
Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(1): GeneralRelu()
(2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
torch.Size([128, 64, 16, 16])
Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(1): GeneralRelu()
(2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
torch.Size([128, 64, 8, 8])
Sequential(
(0): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(1): GeneralRelu()
(2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
torch.Size([128, 128, 4, 4])
Sequential(
(0): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(1): GeneralRelu()
(2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
torch.Size([128, 256, 2, 2])
AdaptiveAvgPool2d(output_size=1)
torch.Size([128, 256, 1, 1])
Lambda()
torch.Size([128, 256])
Linear(in_features=256, out_features=10, bias=True)
torch.Size([128, 10])
Code for building groups
groups=[]
param={}
break_points=[0,7] # also tried [0,1,7]
for i,j in enumerate(break_points):
if j!=0 and i==0:
param['momentum']=0.9
param['params']=model[0:j].parameters()
groups.append(param)
param={}
if i+1==len(break_points):
param['params']=model[j:].parameters()
param['momentum']=0.9
groups.append(param)
break
param['params']=model[j:break_points[i+1]].parameters()
param['momentum']=0.9
groups.append(param)
param={}