Imagine I have a neural network
f(x; w), where
x is my input and
w is its weights vector.
w itself depends on some parameter
w = g(u), where
g(.) is some function. I want to do optimization for
u. What is the most convenient way to do it?
To provide some example, let
w = sin(u) / ||u|| (here
||u|| is the norm of
u). And our model architecture
f(x; w) is ResNet-18. How can we optimize for
u in this case? As far as I understand, if I just make model to take additional parameter
u during initialization, compute
w and set layers parameters to
w this will not work. The problem here is that each layer uses
nn.Parameter() under the hood and
nn.Parameter() ignores the history of computation (we have computed
w = torch.sin(u) / torch.norm(u) and set layers parameters to
w) and gradients for
u will not be computed. Currently, I see the following two solutions:
- Before each iteration, compute
uand update each layer parameters. Then, when gradient with respect to
wis computed, update
umanually (if the gradient
dw/duis not that difficult to compute manually).
- Rewrite ResNet-18 completely from scratch in such a way, that it takes computed
was input and does not use nn.Parameter() and calls everything via torch.nn.functional. In this way the gradient with respect to
uwill be computed automatically (so, it’s like a tensorflow before 2.0 version).
Both of these solutions are quite tedious. Are there any better alternatives?