Add l1-regularization term to loss

jsh0212 · November 20, 2022, 8:07am

I want to add l1-regularization term to loss, but I am little bit confused.

When I print the parameters’ shape of CNN, each of them has different dimension.

>>[print(p.shape) for p in model.parameters()]
torch.Size([64, 3, 3, 3])
torch.Size([64])
torch.Size([64])
torch.Size([64, 64, 3, 3])
torch.Size([64])
torch.Size([64])
torch.Size([64, 64, 3, 3])
torch.Size([64])
torch.Size([64])
torch.Size([64, 64, 3, 3])
torch.Size([64])
torch.Size([64])
torch.Size([64, 64, 3, 3])
torch.Size([64])
torch.Size([64])
torch.Size([128, 64, 3, 3])
torch.Size([128])
torch.Size([128])
torch.Size([128, 128, 3, 3])
torch.Size([128])
torch.Size([128])
torch.Size([128, 64, 1, 1])
torch.Size([128])
torch.Size([128])
torch.Size([128, 128, 3, 3])
...

1) Use torch.linalg.norm

cost += lambda * sum(torch.linalg.norm(p,ord=1) for p in model.parameters())

When I use torch.linalg.norm I have to choose dim.

>>sum(torch.linalg.norm(p,1) for p in model.parameters())
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "<string>", line 1, in <genexpr>
RuntimeError: 'dim' must specify 1 or 2 dimensions when order is numerical and input is not 1-D or 2-D

But when I choose dimension as 0, each output has different shape so summation is impossible.

>>sum(torch.linalg.norm(p,1,0) for p in model.parameters())
Traceback (most recent call last):
  File "<string>", line 1, in <module>
RuntimeError: The size of tensor a (3) must match the size of tensor b (64) at non-singleton dimension 0

How can I add l1-regularization term to loss using torch.linalg.norm?

2) Use torch.tensor.norm

cost += lambda * sum(p.norm(1) for p in model.parameters())

It works but its result value is very large. Is it right way to add l1-regularization term to loss?

>>sum(p.norm(1) for p in model.parameters())
tensor(111630.8516, device='cuda:0', grad_fn=<AddBackward0>)

3) What’s the difference between torch.linalg.norm and torch.tensor.norm?

UCDuan · February 22, 2023, 1:50am

should it be sum(p.norm(1)) or mean(p.norm(1))?