Hi
@julioeu99 weight decay in simple terms just reduces weights calculated with a constant(here 1e-2). This ensures that one does not have large weight values which sometimes leads to early overfilling.
Weight decay sometimes makes the model to converge slower.
By default pytorch has weight_decay=0
Some useful discussions on the same: