Trouble with Param grouping

Jaideep_Valani · May 12, 2019, 8:46am

Hi All
I am using below code to build in param groups depending upon the break points given
say i have 5 seq layers in a model and these seq in turn have sub children. as shown below.
. After I built the groups i pass it on to optimizer ,problem is my accuracy is getting too bad compared to having just one group by default which is having all parameters of the model.
Can some one let me know if m doing groupings correctly which can logically work
Model details
Sequential(
(0): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): GeneralRelu()
(2): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
torch.Size([128, 16, 128, 128])

Sequential(
(0): Conv2d(16, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(1): GeneralRelu()
(2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
torch.Size([128, 32, 64, 64])

Sequential(
(0): Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(1): GeneralRelu()
(2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
torch.Size([128, 64, 32, 32])

Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(1): GeneralRelu()
(2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
torch.Size([128, 64, 16, 16])

Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(1): GeneralRelu()
(2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
torch.Size([128, 64, 8, 8])

Sequential(
(0): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(1): GeneralRelu()
(2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
torch.Size([128, 128, 4, 4])

Sequential(
(0): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(1): GeneralRelu()
(2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
torch.Size([128, 256, 2, 2])

AdaptiveAvgPool2d(output_size=1)
torch.Size([128, 256, 1, 1])

Lambda()
torch.Size([128, 256])

Linear(in_features=256, out_features=10, bias=True)
torch.Size([128, 10])

Code for building groups

groups=[]
param={}
break_points=[0,7] # also tried [0,1,7]
for i,j in enumerate(break_points):
  if j!=0 and i==0:
    param['momentum']=0.9
    param['params']=model[0:j].parameters()
    groups.append(param)
    param={}
  if i+1==len(break_points):
    param['params']=model[j:].parameters()
    param['momentum']=0.9
    groups.append(param)
    break 
    
  param['params']=model[j:break_points[i+1]].parameters()
  param['momentum']=0.9
  groups.append(param)
  param={}

ptrblck · May 12, 2019, 11:07pm

The first condition will never be met as far as I see it, since i==0 only in the first iteration, where your break_points list contains a 0, so that j!=0 will be False.

Jaideep_Valani · May 13, 2019, 11:14am

hi ptrblck
thanks for reply …first condition is for case when break points are say 3,7 …so first group has to be 0 to 3 ,secomd 3 to 7 and last 7 till end .so if i pass 0 ,7 ,then groups wld be 0 to 7 and 7 till end …
here m not sure if m grouping param correctly as my model performs worse with these grouping compared to having one default group of all params.