I wanted to do it manually so I implemented it as follows:
reg_lambda=1.0 l2_reg=0 for W in mdl.parameters(): l2_reg += *W.norm(2) batch_loss = (1/N_train)*(y_pred - batch_ys).pow(2).sum() + reg_lambda*l2_reg ## BACKARD PASS batch_loss.backward() # Use autograd to compute the backward pass. Now w will have gradients
is this correct? the key part I care about is that the SGD update works correctly. i.e.:
## SGD update for W in mdl.parameters(): delta = eta*W.grad.data W.data.copy_(W.data - delta) # W - eta*g + A*gdl_eps
2w term in the SGD update.
here is a related question: Simple L2 regularization?