# Problems about L2 Regularization excluding bias

I have implemented a simple network with nn.Module and I encountered the problem about regularization. The optimizer provides ‘weight_decay’, but it includes all the parameters. I want to do L2 excluding bias with which may cause underfitting.
I calculated the regular cost and separated the weight and bias. However when I applied them to torch.optim.Adam using param_groups the loss did not drop.
I wondered if there was any wrong with my code.

``````class CatNetwork(nn.Module):
def __init__(self, in_dim, n_hidden_1, out_dim):
super(CatNetwork, self).__init__()
self.layer1 = nn.Sequential(
nn.Linear(in_dim, n_hidden_1),
nn.ReLU(True)
)
self.layer2 = nn.Sequential(
nn.Linear(n_hidden_1, out_dim),
nn.Sigmoid()
)

def forward(self, x):
x = self.layer1(x)
x = self.layer2(x)
return x

# Regularization cost
def regularization(self, model, weight_decay):
parameter_list = model.state_dict().keys()         # parameter list odict_keys(['layer1.0.weight', 'layer1.0.bias', 'layer2.0.weight', 'layer2.0.bias'])
loss_reg = 0
for para in parameter_list:
loss_reg += torch.sum(torch.square(model.state_dict()[para]))
return weight_decay * loss_reg

model = CatNetwork(n_input, 16, 1)

# Separate weight and bias
weight_list = []; bias_list = []
for i in range(1, 3):
para_weight = 'layer' + str(i) + '.0.weight'
para_bias = 'layer' + str(i) + '.0.bias'
weight_list += [model.state_dict()[para_weight]]
bias_list += [model.state_dict()[para_bias]]

# loss function
criterion = nn.BCELoss()

# optimizer
optimizer = torch.optim.Adam([{'params': weight_list, 'weight_decay':1}, {'params': bias_list, 'weight_decay':0}], lr=lr)

...
for i in range(epoch_number);
out = model(x)
loss = criterion(out, y) + model.regularization(model, weight_decay)  # loss + reg_cost
# I'm not sure whether I should add the regularization cost as I don't know how the 'weight_decay' method of optimizer really works.
# But it has no difference to the result whether I keep it or drop it.
...
``````

You could check, if the parameters are updated using the current approach, as I’m not sure if your approach of using the `state_dict` to separate the weight and bias parameters would work.
Directly accessing the parameters would work:

``````weight_list = [p for n, p in model.named_parameters() if 'weight' in n]
bias_list = [p for n, p in model.named_parameters() if 'bias' in n]
``````

Besides that you are using two weight decay approaches via the `optimizer` as well as your custom regularization so you should also check if you want to remove the `regularization` usage.

1 Like

Thanks a lot. I was even trying with nn.Parameters() to figure it out but failed.