I want to create a linear layer, whose weight is the weighted sum of other two linear layers. The weighted factor is denoted as alpha, i.e.

outputs = [alpha*W1+(1-alpha)*W2] * inputs

I want the gradient of loss wrt W1, W2 and also alpha, but after loss.backward(), both gradients are None. It seems that W1, W2 or alpha are all not used in the computational graph. Is there a way to incorporate these into the graph?

However, the example I provide is only the simplest form. What I really want to do is to parametrize every weight in a deep neural network (conv weight, linear weight, bn weight etc) with the parameter alpha. Also, the number of Wi may be greater than 2, and the interpolation method could be much more complicated than linear combination. For example, rather than

W=alpha*W1+(1-alpha)*W2

I would like

W=f1(alpha)*W1+f2(alpha)*W2+...+fk(alpha)*Wk

W can be weight not only from linear layers, but also from conv layers and so on. I wonder if it’s still possible to explicitly write out the formula in this situation. Thus I would like to know whether we can deal with the layer weight directly.