Imagine I have a neural network f(x; w)
, where x
is my input and w
is its weights vector.
Weights vector w
itself depends on some parameter u
, i.e. w = g(u)
, where g(.)
is some function. I want to do optimization for u
. What is the most convenient way to do it?
To provide some example, let w = sin(u) / ||u||
(here ||u||
is the norm of u
). And our model architecture f(x; w)
is ResNet-18. How can we optimize for u
in this case? As far as I understand, if I just make model to take additional parameter u
during initialization, compute w
and set layers parameters to w
this will not work. The problem here is that each layer uses nn.Parameter()
under the hood and nn.Parameter()
ignores the history of computation (we have computed w = torch.sin(u) / torch.norm(u)
and set layers parameters to w
) and gradients for u
will not be computed. Currently, I see the following two solutions:
- Before each iteration, compute
w
fromu
and update each layer parameters. Then, when gradient with respect tow
is computed, updateu
manually (if the gradientdw/du
is not that difficult to compute manually). - Rewrite ResNet-18 completely from scratch in such a way, that it takes computed
w
as input and does not use nn.Parameter() and calls everything via torch.nn.functional. In this way the gradient with respect tou
will be computed automatically (so, it’s like a tensorflow before 2.0 version).
Both of these solutions are quite tedious. Are there any better alternatives?