Adaptive l2 regularization experiment

hi all, i am an amateur getting caught up with pytorch, started tinkering around and had a little idea to make the regularization adapt with respect to the loss per epoch. take these results with a grain of salt.

maybe not enough data/runs to be make any conclusions, but this was mainly an exercise in learning for me but who knows maybe this is useful to someone else.
here is a paper that confirmed my suspicions and gave me some more empirical backing (though i admit my implementation is much simpler)…

any feedback would be appreciated, thanks

I haven’t read the paper but would be interested to know how many runs were performed to create these plots? Do these lines represent a single run?

1 Like

yeah they are single run, should be very easily reproducible to test (like copy paste and run the github code if you have all the dependencies) though results may vary (i hope not a lot though lol). this was with MNIST dataset and just on cpu