However, I’m a bit afraid that the thing can become unstable (what if the filter norm gets really close to zero?).

The other method I was thinking to use is to reproject the filter tensor to the unit norm sphere after every .backeard() step, that is I divide it by its norm at every step. However, I’m not sure about how to manually change the values of a tensor without breaking the autograd.

You identified the two possible options quite well:

Do projected gradient methods. You can rescale the weights within a with torch.no_grad(): block not to bother the autograd.

Change your whole optimization problem where the constraint is enforced by the structure of the function. You can add an epsilon to the norm to make sure it won’t go to 0.

There is no strong reason though for one to be better than the other. You’ll have to check empirically !

Thanks a lot for your reply!
I like the first method, I will try to make some code that works using the torch.no_grad() method (even if is my first time to use it )

For the sake of completeness, regarding the second method you posted I’m a bit reluctant to think that I can add a small epsilon to the norm. For example, assuming that I’m trying to minimize the following loss

loss = torch.norm(filtered_out)**2

then the filter will probably converge to [0,0,0,0].