Regularization of weights without using error term question

Nyakov · December 21, 2024, 12:24am

So, I am doing strange stuff. I have nn tanh layer. And problem is - magnitude of weights rising to high in training and quickly outputs start clumping between 1 and -1

I am reading about different regularization techniques, for example L1\L2 norm. But they implying to add weights norm to error term.

But I do not really have loss function…

I only have weights gradient (G). So my learning’s looks like W = W+G*lr

Question is, what can I do to prevent weights grow to much when I have only access to weights and gradient?

I tried different stuff with mixed results.
For example divide weights in each learning step be square of L2 norm:

W_norm = torch.linalg.vector_norm(W, dim=1, keepdim=True)
W.div_(w_norm.pow(2))

And it working somehow (I see that weights magnitude stop constantly growing) but some outputs still got clumped.