Since the L1 regularizer is not differentiable everywhere, what does PyTorch do when it encounters differentiating this functions? A simple example shows PyTorch returns zero.
import torch
x = torch.linspace(-1.0, 1.0, 5, requires_grad=True)
y = torch.abs(x)
y[2].backward()
print(x.grad)
tensor([-0., -0., 0., 0., 0.])
Why is this the case?