Questions about soft parameter sharing with loss function design

Hi all, I intend to implement parameter sharing method here to add it as a regularization approach:

def weight_assign_loss(module, graph1, graph2):
    loss = 0
    dict_l2_g1 = module.convl2[graph1].state_dict()
    dict_l2_g2 = module.convl2[graph2].state_dict()
    for i in list(dict_l2_g1.keys()):
        loss += F.mse_loss(dict_l2_g1[i], dict_l2_g2[i])
    dict_l3_g1 = module.convl3[graph1].state_dict()
    dict_l3_g2 = module.convl3[graph2].state_dict()
    for i in list(dict_l3_g1.keys()):
        loss += F.mse_loss(dict_l3_g1[i], dict_l3_g2[i])
    return loss

Here is my function, which refers from this page:
However, if I add this loss to the original loss, this part will never decrease.

If I do not add it to the original loss, I will receive this error:
element 0 of tensors does not require grad and does not have a grad_fn

Could anyone please help me figure out why? Thanks.