How to transfer MatConvNet to Pytorch

Hi guys, I encounter a problem. I have a function in MatLab, but I need to realize this in PyTorch. Here is the function.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Join the gradients (from the
% discriminator and the generator)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function d = join_gradients(dgdz_d, dgdz_f, kappa)
    % check kappa's range
    if kappa<0 || kappa>1
        error('kappa has to be in the [0,1] range!');
    end
    % join gradients
    d = dgdz_d;
    for d_i = 1:size(dgdz_d,1)
        d1 = dgdz_d(d_i,:,:,:);
        d2 = dgdz_f(d_i,:,:,:);
        if norm(d1(:))>norm(d2(:))
            d1 = d1*norm(d2(:))/norm(d1(:));
        else
            d2 = d2*norm(d1(:))/norm(d2(:));
        end
        d(d_i,:,:,:) = kappa*d1 + (1-kappa)*d2;
    end
end

I explain the meaning of this function. The dgdz_d is the gradient calculated from the loss, named as loss_1, and the dgdz_f is the gradient calculated from another loss, named as loss_2. The kappa is an coefficient to balance the two gradients.
Could I just use

loss = kappa * loss_1 + (1.0 - kappa) * loss_2
loss.backward()

to substitute this function in PyTorch?
If not, could you give me advice about how to realize this in PyTorch? Thanks a lot.

Hi,

Doing the weighting on the loss will definitely match the behavior of d(d_i,:,:,:) = kappa*d1 + (1-kappa)*d2; yes.

But there seem to be some renormalization happening here before the gradients are combined? I’m not sure I understand what it’s trying to do though.

Yes, in MatLab code, before joining the gradients, they normalize the two gradients. I think the aim of using normalization is to scale the two gradients to the same magnitude.

if norm(d1(:))>norm(d2(:))
    d1 = d1*norm(d2(:))/norm(d1(:));
else
    d2 = d2*norm(d1(:))/norm(d2(:));
end

If the norm of d1 is bigger than d2, it will scale d1 to the same magnitude of d1. Because d1 and d2 have the same magnitude, the kappa will work better.
So, How does PyTorch perform the same with this MatLab code?

Well if you want to do that. You cannot use a single loss.
You will have to backward each loss independently and save the gradients (with .clone()).
And then you can do the same code that check the norm of each gradient and compute the final gradient.

Thanks a lot.
According your advice, I write the following code in PyTorch.

# calculating the gradients of two losses
grads_loss_1 = autograd.grad(loss_1, model.parameters(), retain_graph=True)
grads_loss_2 = autograd.grad(loss_2, model.parameters(), retain_graph=False)
# update generator parameters gradients
for idx, p in enumerate(gen.parameters()):
    grad_1 = grads_loss_1[idx]
    grad_2 = grads_loss_2[idx]
    if torch.norm(grad_1, p=2) > torch.norm(grad_2, p=2):
        grad_1 = grad_1 * torch.norm(grad_2, p=2) / torch.norm(grad_1, p=2)
    else:
        grad_2 = grad_2 * torch.norm(grad_1, p=2) / torch.norm(grad_2, p=2)
    p.grad = (kappa * grad_1 + kappa * grad_2).clone()

It works well in my work. Please point it out, if you find any errors.
Thanks again for your suggestions!

Hi,

No error that I can see. it looks quite good!