I was wondering how to do First Linear Layer L1 Weight regularization, for feature engineering. Out of curiosity I want to see what a MLP thinks the top N features are. I read this post
I’m probably mistaken, but this seems wrong… past answers recommend for W in model.parameters()
, so in my case, where my model’s first Linear layer is L1
it would be for W in model.L1.parameters()
. But this includes the bias term!.
Most posts guilty of this, however 1 saw one that is in line with my expectation.
So what’s going on here, who is mistaken? I think that regularizing the bias probably isn’t too bad, it will be tiny, and doesn’t matter if there will be normalization done directly after.