I was wondering how to do First Linear Layer L1 Weight regularization, for feature engineering. Out of curiosity I want to see what a MLP thinks the top N features are. I read this post
I’m probably mistaken, but this seems wrong… past answers recommend
for W in model.parameters(), so in my case, where my model’s first Linear layer is
L1 it would be
for W in model.L1.parameters(). But this includes the bias term!.
Most posts guilty of this, however 1 saw one that is in line with my expectation.
So what’s going on here, who is mistaken? I think that regularizing the bias probably isn’t too bad, it will be tiny, and doesn’t matter if there will be normalization done directly after.