Hi guys, I encounter a problem. I have a function in MatLab, but I need to realize this in PyTorch. Here is the function.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Join the gradients (from the
% discriminator and the generator)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function d = join_gradients(dgdz_d, dgdz_f, kappa)
% check kappa's range
if kappa<0 || kappa>1
error('kappa has to be in the [0,1] range!');
end
% join gradients
d = dgdz_d;
for d_i = 1:size(dgdz_d,1)
d1 = dgdz_d(d_i,:,:,:);
d2 = dgdz_f(d_i,:,:,:);
if norm(d1(:))>norm(d2(:))
d1 = d1*norm(d2(:))/norm(d1(:));
else
d2 = d2*norm(d1(:))/norm(d2(:));
end
d(d_i,:,:,:) = kappa*d1 + (1-kappa)*d2;
end
end
I explain the meaning of this function. The dgdz_d is the gradient calculated from the loss, named as loss_1, and the dgdz_f is the gradient calculated from another loss, named as loss_2. The kappa is an coefficient to balance the two gradients.
Could I just use
Yes, in MatLab code, before joining the gradients, they normalize the two gradients. I think the aim of using normalization is to scale the two gradients to the same magnitude.
if norm(d1(:))>norm(d2(:))
d1 = d1*norm(d2(:))/norm(d1(:));
else
d2 = d2*norm(d1(:))/norm(d2(:));
end
If the norm of d1 is bigger than d2, it will scale d1 to the same magnitude of d1. Because d1 and d2 have the same magnitude, the kappa will work better.
So, How does PyTorch perform the same with this MatLab code?
Well if you want to do that. You cannot use a single loss.
You will have to backward each loss independently and save the gradients (with .clone()).
And then you can do the same code that check the norm of each gradient and compute the final gradient.