Sharing a parameter between multiple loss functions/graphs

I am trying to train several networks on similar datasets, and there are several constraints on the outputs that are common, even if the actual network outputs will be different. However, I don’t necessarily know the details of the relationship beforehand, and I want to give the parameters associated with the constraint to the optimizer so that it can update them during training. I am envisioning something (very loosely) like this:

def constraint(y,param_mat):
    return torch.mm(param_mat,y)

param_mat = torch.Parameter(torch.tensor(np.random.normal(0,1,(y_size,y_size)),requires_grad = True)

for i in range(num_nets):
    ipt = batch[0][i]
    target = batch[1][i]
    y = model_list[i](ipt)
    loss = loss1(y,target)+loss2(constraint(t,param_mat))
    optimizer.zero_grad()
    loss.backwards()
    optimizer.step()

Will this type of setup cause the parameter to be added to the computation graph and updated by the optimizer?

You would have to add param_mat to an optimizer (or create a new one for it) so that it can be optimized.

So perhaps something like this?

def constraint(y,param_mat):
    return torch.mm(param_mat,y)

param_mat = torch.Parameter(torch.tensor(np.random.normal(0,1,(y_size,y_size)),requires_grad = True)
param_list = [list(model_list[i].parameters()) + list(params_mat) for i in range(num_models)]
optimizers_list = [optim.Adam(param_list[i]) for i in range(num_models)]
for i in range(num_nets):
    ipt = batch[0][i]
    target = batch[1][i]
    y = model_list[i](ipt)
    loss = loss1(y,target)+loss2(constraint(t,param_mat))
    optimizers_list[i].zero_grad()
    loss.backwards()
    optimizers_list[i].step()

This approach would pass the same param_mat to all optimizers, which would update it with its gradient and I assume that’s the use case you are looking for.