How does one implement Weight regularization (l1 or l2) manually without optimum?

Esteban_Lanter · October 13, 2019, 12:54pm

According to docs, providing an integer argument to torch.norm yields the following:

(https://pytorch.org/docs/stable/torch.html?highlight=norm#torch.norm)
sum(abs(x)ord)(1./ord)

Providing an integer argument of 2 and squaring the result would then correspond exactly to the definition of l2:

I can’t see why multiplying by 0.5 would improve anything…? Except of course if you are using the 0.5 as the lambda, but that value is entirely arbitrary then.

Stefano_Francesco_Pi · July 22, 2020, 12:26pm

For me, initializing the regularisation loss in this way causes problems. Don’t know why but I obtain also negative value during some iteration.

ataxias · September 24, 2020, 5:52pm

It’s practically insubstantial. Originally people used the 1/2 factor because it makes the notation simpler when calculating the derivatives on paper. If you omit it, your optimal lambda values will simply be half as large as what other people (who have used the common 1/2 formulation) report in their papers.

J_Martin · November 13, 2020, 12:29pm

You shouldn’t only divide the regularization by 2 but (more importantly) also by N_train, compare for instance https://towardsdatascience.com/understanding-the-scaling-of-l²-regularization-in-the-context-of-neural-networks-e3d25f8b50db.

If you want a quick explanation why this should be done look at the formulas that @Esteban_Lanter posted: the first term is simply the sum whereas in @Brando_Miranda 's implementation this term is divided by N_train. If you want to use the later, you also have to divide the regularization by the same scaling, otherwise you will destroy the balance between the two terms.

If you want a more detailled explanation why this should be done (and why the factor 0.5 makes sense), have a look into the Bayesian interpretation of L2 (or L1) regularization

theGuyWithBlackTie · December 4, 2021, 6:45am

Why OP has user .norm on the weights? As per the L1 & L2 formulas, it should be either abs (in L1) or pow(2) (in L2) for the model’s weights. Isn’t it?

What I’m missing here ? Can anybody please explain it to me?

manoj_batra · August 17, 2022, 10:35am

This gives me:

RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.