L1-regularization for a single matrix

I have a hierarchical model with many components.
I want one particular matrix to be sparse and to do so, I am trying to applying L1 regularization to only this matrix involved in my architecture.
So far, I have found discussions about applying L1 reg. to the final loss function, but in this way, I would force the L1 on the overall model, while I want just one matrix to be sparse.

Is there any way to do so?

1 Like

You can add the L1 regularization to your loss and call backward on the sum of both.
Here is a small example using the weight matrix of the first linear layer to apply the L1 reg:

model = nn.Sequential(
    nn.Linear(10, 10),
    nn.ReLU(),
    nn.Linear(10, 2)
)

x = torch.randn(10, 10)
target = torch.empty(10, dtype=torch.long).random_(2)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=1e-3)

for epoch in range(1000):
    optimizer.zero_grad()
    output = model(x)
    loss = criterion(output, target)
    l1_norm = torch.norm(model[0].weight, p=1)
    loss += l1_norm
    loss.backward()
    optimizer.step()
    
    print('Epoch {}, loss {}, mat0 norm {}'.format(
        epoch, loss.item(), l1_norm.item()))

If you comment out loss += l1_norm you’ll see, that the norm won’t necessarily be decreased.

7 Likes

Thank you, looks what I was looking for!
In this way, even the weights in the second first layer will be affected by the L1 reg, right?

Hi @ptrblck

Here you have shown how to apply l1 regularization on a single layer, using torch.norm(model[0].weight, p=1).

What if I have, say 10 layers and want to apply l1 regularization on all of them. Can I do it in a one-shot, or do I need to iterate over each layer using model.parameters()?

You would have to iterate the parameters as shown in this example.

1 Like

Thank you for your answer. Now I want to know why your solution

torch.norm(model[0].weight, p=1)

doesn’t work for below network?

 l1_norm = torch.norm(model[2].weight, p=2)
TypeError: 'Net_test' object is not subscriptable
class Net_test(nn.Module):

    def __init__(self):
        super(Net_test, self).__init__()
        self.fc1 = nn.Linear(10, 10)
        self.fc2 = nn.Linear(10, 2)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x


model = Net_test()

x = torch.randn(10, 10)
target = torch.empty(10, dtype=torch.long).random_(2)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=1e-3)

for epoch in range(10):
    optimizer.zero_grad()
    output = model(x)
    loss = criterion(output, target)
    l1_norm = torch.norm(model[2].weight, p=1)
    loss += l1_norm
    loss.backward()
    optimizer.step()

    print('Epoch {}, loss {}, mat0 norm {}'.format(
        epoch, loss.item(), l1_norm.item()))

I found the solution :sweat_smile: (I was careless about it)

l1_norm = torch.norm(model.fc1.weight, p=1)

See This: