[Please Help!] How can copy the gradient from Net A to Net B

If I have two networks with the same architecture, say A and B. For security reason, Net B cannot get access to the training data. So I need to train A and update B with A’s gradient.

Note we cannot always copy A to B because A will be updated when training.

you can select model parameter with model.named_parameters(). This can be something like below. I dont try it but shows general syntax

for net1,net2 in zip(A.named_parameters(),B.named_parameters()):
                net2[1].data.grad = net1[1].data.grad

Thank you for your quick reply.
I tried and it works. But another problem occurs. After I copied the gradient from A to B, how can I apply the gradient to B so it will be updated with the gradient? I tried opt_B.step() but it doesn’t work.

you need to first create an optimizer for model B. then apply optimizer.step

optim.Adam(B.parameters(), lr=args.lr)

Yes, that’s what I exactly do for model B. But it doesn’t work.

        for batch_idx, data in enumerate(self.dataset_s[0]):
            img = data['img']
            label = data['label']
            img = Variable(img.cuda())
            label = Variable(label.long().cuda())
            self.opt_g_s[0].zero_grad()
            self.opt_c_s[0].zero_grad()
            self.opt_g_t.zero_grad()
            self.opt_c_t.zero_grad()
            # output = self.C_s[0](self.G_s[0](img))
            f = self.G_s[0](img)
            output = self.C_s[0](f)

            f_t = self.G_t(img)
            output_t = self.C_t(f_t)

            loss = criterion(output, label)
            loss_t = criterion(output_t, label)
            loss.backward()
            for net1, net2 in zip(self.G_s[0].named_parameters(), self.G_t.named_parameters()):
                # print(net1)
                net2[1].grad = net1[1].grad
            for net1, net2 in zip(self.C_s[0].named_parameters(), self.C_t.named_parameters()):
                net2[1].grad = net1[1].grad
            self.opt_g_t.step()
            self.opt_c_t.step()
            self.opt_g_s[0].step()
            self.opt_c_s[0].step()
            print(loss_t.data[0], loss.data[0])

Here is the code, G_s, C_s belong to net A, G_t, C_t belong to net B. output, loss are net A’s output and loss, output_t, loss_t are net B’s loss and output.

Ideally, I would expect the loss_t will decrease in the same way with loss. But unfortunately, this is what I got:

2.325899600982666 2.311429977416992
2.302957773208618 2.259328603744507
2.338597059249878 2.1615893840789795
2.310314178466797 1.9996391534805298
2.3391530513763428 1.930885672569275
2.3019471168518066 1.79799485206604
2.312781810760498 1.7417033910751343
2.315269708633423 1.7081165313720703
2.3183462619781494 1.5497541427612305
2.3130362033843994 1.5103309154510498
2.3282203674316406 1.3914222717285156
2.311227560043335 1.425149917602539
2.32726788520813 1.3116666078567505
2.3288776874542236 1.175093412399292
2.3100202083587646 1.1519544124603271
2.3509089946746826 1.1114654541015625
2.3917856216430664 1.0374433994293213
2.3640432357788086 0.9599748253822327
2.3335037231445312 0.9961658716201782
2.336228847503662 1.1022270917892456
2.3434970378875732 0.9420759677886963
2.314120292663574 0.9514802694320679
2.3666858673095703 0.9194179773330688
2.312129497528076 0.9283604621887207
2.3348450660705566 0.8091476559638977
2.396702766418457 0.8768067359924316
2.3209187984466553 0.8434578776359558
2.4188976287841797 0.645444393157959
2.420340061187744 0.7303286194801331
2.3238790035247803 0.831498384475708
2.3971896171569824 0.8141204714775085
2.3836135864257812 0.6898840069770813
2.33423113822937 0.6100970506668091
2.361891269683838 0.6768393516540527
2.3814330101013184 0.7510046362876892
2.3523271083831787 0.6501622200012207
2.335498094558716 0.6775341033935547
2.3763034343719482 0.5295536518096924
2.3889424800872803 0.5690212845802307
2.4591455459594727 0.5898173451423645
2.369767665863037 0.7039562463760376
2.4159860610961914 0.5748851299285889
2.3444278240203857 0.37794822454452515
2.394176483154297 0.5638140439987183
2.426819324493408 0.43249520659446716
2.357060432434082 0.5665342211723328
2.391160488128662 0.47172683477401733
2.4093570709228516 0.43452709913253784
2.3887462615966797 0.40173453092575073
2.471668243408203 0.4635242223739624
2.3329155445098877 0.5089845657348633
2.344878911972046 0.45018270611763
2.377624273300171 0.5257216095924377

I think you should also clone the gradients of A&B. Can you change like below. Without clone data is not explicitly copied to destination I guess.

for net1,net2 in zip(A.named_parameters(),B.named_parameters()):
                net2[1].data.grad = net1[1].data.grad.clone

Weird requirement :thinking:, did you solve the problem?

Thanks.
This works:

for net1,net2 in zip(A.named_parameters(),B.named_parameters()):
                net2[1].grad = net1[1].grad.clone()
1 Like

Yeah, I finally find the reason.

The network B is updated correctly with network A.
The reason why the loss of network B is not decreasing is that the initialization of net B is different from net A. After I initialize them with same parameters, the loss of net B is synchronized with net A.

1 Like

Glad to know!

What about remove the .clone().
I mean just use net2[1].grad = net1[1].grad and they will point to the same address in memory, and I am not sure if it will use less memory than the present implementation, you could have a try

Yeah, I think it saves more memory as it’s not deep copy without clone()

why not accept the solution and close the topic as solved.