# How to take weighted combination of models and calculate the gradient wrt combination weight?

Suppose that I have two model M1 and M2. I want to make a joint model M=a1M1+a2M2, later I want to calculate the gradient of M wrt a1 and a2 (i.e. wrt combination weight).

Could anyone suggest how to do that?

Thanks

If you define `a1` and `a2` as tensors with `requires_grad=True` or as `nn.Parameters`, Autograd will compute their gradients in the backward pass.
I’m not sure how the "gradient of `M` wrt `aX`" would be calculated in your example, but if you compute a loss using `M`, you would be able to compute the gradient of this loss function with respect to the parameters.

I’m not familiar with your use case, but note that optimizing these parameters (`aX`) could just push them to negative values, which would then “minimize” the loss.

Thanks, @ptrblck for the reply. My main concern is how to make joint model M so that model knows about all the M1, M2etc.
I tried M=a1M1.parameters()+a2M2.parameters()
But now model M does not know about M1 and M2.
Here my objective is using a convex combination of a1 and a2 to learn a joint model that minimize the loss.

Thanks

This line suggests that you want to do parameter averaging with replicated networks. That won’t work as network re-runs converge to distinct local minima.

OTOH, network output averaging (ensembles) can work, but you normally need a staged training for that. I.e. combination weights should be trained with independently pre-trained contributing models.