malioboro
(Rian Adam)
October 8, 2017, 5:41pm
1
I’m sorry, I’m new in pytorch, and I can’t find how pytorch implement L2 regularization (weigh_decay
)?
I mean there are several styles of formula out there to implement L2 regularization, which one is implemented in pytorch? because it leads to how big is value needed to assigned
Thank You
richard
(Richard Zou)
October 9, 2017, 2:43pm
2
Looking at the code for the SGD optimizer in particular it looks like it’s implemented by
adding weight_decay * data
to the gradients. Does this answer your question?
malioboro
(Rian Adam)
October 9, 2017, 4:58pm
3
why weight_decay * data
?
does the line:
if weight_decay != 0:
d_p.add_(weight_decay, p.data)
means weight_decay + data
?
richard
(Richard Zou)
October 9, 2017, 5:19pm
4
That line means, in other notation:
d_p = d_p + weight_decay * p.data
.
Here’s a good article about why the L2 penalty is implemented by adding weight_decay * weight_i
to the gradient: https://stats.stackexchange.com/questions/29130/difference-between-neural-net-weight-decay-and-learning-rate
5 Likes
malioboro
(Rian Adam)
October 10, 2017, 7:26am
5
Thank you for your explanation and your reference too,
I wonder where I can find the refference of d_p.add_(weight_decay, p.data)
refers to d_p = d_p + weight_decay * p.data
?
trypag
(Pierre Antoine Ganaye)
October 10, 2017, 8:38am
6
torch.add(input, value=1, other, out=None)
Each element of the Tensor other is multiplied by the scalar value and added to each element of the Tensor input. The resulting Tensor is returned.
The shapes of input and other must be broadcastable.
out=input+(other∗value)
If other is of type FloatTensor or DoubleTensor, value must be a real number, otherwise it should be an integer.
http://pytorch.org/docs/master/torch.html
2 Likes
malioboro
(Rian Adam)
October 10, 2017, 9:04am
7
Thank You, It’s clear now