Apologies, I was wrong. For a simple example with scalars, y = w * x
,
even if w = 0
, dy/dw = x
, so the weights can indeed change from 0. I
agree with @bzcheeseman, masking in the forward pass seems like a
reasonable way to accomplish what you want.
2 Likes