I wanted to do it manually so I implemented it as follows:

```
reg_lambda=1.0
l2_reg=0
for W in mdl.parameters():
l2_reg += *W.norm(2)
batch_loss = (1/N_train)*(y_pred - batch_ys).pow(2).sum() + reg_lambda*l2_reg
## BACKARD PASS
batch_loss.backward() # Use autograd to compute the backward pass. Now w will have gradients
```

is this correct? the key part I care about is that the SGD update works correctly. i.e.:

```
## SGD update
for W in mdl.parameters():
delta = eta*W.grad.data
W.data.copy_(W.data - delta) # W - eta*g + A*gdl_eps
```

has the `2w`

term in the SGD update.

here is a related question: Simple L2 regularization?