How does one implement Weight regularization (l1 or l2) manually without optimum?

it’s almost correct.

l2_reg here is a python scalar, so operations done on it are not recorded for the autograd backward().
Instead, you should make l2_reg to be an autograd Variable.

l2_reg = None
for W in mdl.parameters():
    if l2_reg is None:
        l2_reg = W.norm(2)
    else:
        l2_reg = l2_reg + W.norm(2)
batch_loss = (1/N_train)*(y_pred - batch_ys).pow(2).sum() + l2_reg * reg_lambda
batch_loss.backward()
16 Likes