L1-regularization for a single matrix

I have a hierarchical model with many components.
I want one particular matrix to be sparse and to do so, I am trying to applying L1 regularization to only this matrix involved in my architecture.
So far, I have found discussions about applying L1 reg. to the final loss function, but in this way, I would force the L1 on the overall model, while I want just one matrix to be sparse.

Is there any way to do so?

You can add the L1 regularization to your loss and call backward on the sum of both.
Here is a small example using the weight matrix of the first linear layer to apply the L1 reg:

model = nn.Sequential(
    nn.Linear(10, 10),
    nn.ReLU(),
    nn.Linear(10, 2)
)

x = torch.randn(10, 10)
target = torch.empty(10, dtype=torch.long).random_(2)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=1e-3)

for epoch in range(1000):
    optimizer.zero_grad()
    output = model(x)
    loss = criterion(output, target)
    l1_norm = torch.norm(model[0].weight, p=1)
    loss += l1_norm
    loss.backward()
    optimizer.step()
    
    print('Epoch {}, loss {}, mat0 norm {}'.format(
        epoch, loss.item(), l1_norm.item()))

If you comment out loss += l1_norm you’ll see, that the norm won’t necessarily be decreased.

Thank you, looks what I was looking for!
In this way, even the weights in the second first layer will be affected by the L1 reg, right?

Hi @ptrblck

Here you have shown how to apply l1 regularization on a single layer, using torch.norm(model[0].weight, p=1).

What if I have, say 10 layers and want to apply l1 regularization on all of them. Can I do it in a one-shot, or do I need to iterate over each layer using model.parameters()?

You would have to iterate the parameters as shown in this example.

Thank you for your answer. Now I want to know why your solution

torch.norm(model[0].weight, p=1)

doesn’t work for below network?

 l1_norm = torch.norm(model[2].weight, p=2)
TypeError: 'Net_test' object is not subscriptable
class Net_test(nn.Module):

    def __init__(self):
        super(Net_test, self).__init__()
        self.fc1 = nn.Linear(10, 10)
        self.fc2 = nn.Linear(10, 2)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x


model = Net_test()

x = torch.randn(10, 10)
target = torch.empty(10, dtype=torch.long).random_(2)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=1e-3)

for epoch in range(10):
    optimizer.zero_grad()
    output = model(x)
    loss = criterion(output, target)
    l1_norm = torch.norm(model[2].weight, p=1)
    loss += l1_norm
    loss.backward()
    optimizer.step()

    print('Epoch {}, loss {}, mat0 norm {}'.format(
        epoch, loss.item(), l1_norm.item()))

I found the solution :sweat_smile: (I was careless about it)

l1_norm = torch.norm(model.fc1.weight, p=1)

See This: