You can add the L1 regularization to your loss and call backward
on the sum of both.
Here is a small example using the weight matrix of the first linear layer to apply the L1 reg:
model = nn.Sequential(
nn.Linear(10, 10),
nn.ReLU(),
nn.Linear(10, 2)
)
x = torch.randn(10, 10)
target = torch.empty(10, dtype=torch.long).random_(2)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=1e-3)
for epoch in range(1000):
optimizer.zero_grad()
output = model(x)
loss = criterion(output, target)
l1_norm = torch.norm(model[0].weight, p=1)
loss += l1_norm
loss.backward()
optimizer.step()
print('Epoch {}, loss {}, mat0 norm {}'.format(
epoch, loss.item(), l1_norm.item()))
If you comment out loss += l1_norm
you’ll see, that the norm won’t necessarily be decreased.