So, I am doing strange stuff. I have nn tanh layer. And problem is - magnitude of weights rising to high in training and quickly outputs start clumping between 1 and -1
I am reading about different regularization techniques, for example L1\L2 norm. But they implying to add weights norm to error term.
But I do not really have loss function…
I only have weights gradient (G). So my learning’s looks like W = W+G*lr
Question is, what can I do to prevent weights grow to much when I have only access to weights and gradient?
I tried different stuff with mixed results.
For example divide weights in each learning step be square of L2 norm:
W_norm = torch.linalg.vector_norm(W, dim=1, keepdim=True)
W.div_(w_norm.pow(2))
And it working somehow (I see that weights magnitude stop constantly growing) but some outputs still got clumped.