I have a hierarchical model with many components.
I want one particular matrix to be sparse and to do so, I am trying to applying L1 regularization to only this matrix involved in my architecture.
So far, I have found discussions about applying L1 reg. to the final loss function, but in this way, I would force the L1 on the overall model, while I want just one matrix to be sparse.
You can add the L1 regularization to your loss and call backward on the sum of both.
Here is a small example using the weight matrix of the first linear layer to apply the L1 reg:
model = nn.Sequential(
nn.Linear(10, 10),
nn.ReLU(),
nn.Linear(10, 2)
)
x = torch.randn(10, 10)
target = torch.empty(10, dtype=torch.long).random_(2)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=1e-3)
for epoch in range(1000):
optimizer.zero_grad()
output = model(x)
loss = criterion(output, target)
l1_norm = torch.norm(model[0].weight, p=1)
loss += l1_norm
loss.backward()
optimizer.step()
print('Epoch {}, loss {}, mat0 norm {}'.format(
epoch, loss.item(), l1_norm.item()))
If you comment out loss += l1_norm you’ll see, that the norm won’t necessarily be decreased.
Here you have shown how to apply l1 regularization on a single layer, using torch.norm(model[0].weight, p=1).
What if I have, say 10 layers and want to apply l1 regularization on all of them. Can I do it in a one-shot, or do I need to iterate over each layer using model.parameters()?