The weight_decay
might be too aggressive in PyTorch, as it’ll add all parameters to the regularization term.
Have a look at this post to exclude batch norm params etc. (or just add the conv parameters to the regularization).
1 Like