Can I comobine two loss function into one. One being pytorch tensor other being numpy

Hassan_Mahmood · July 8, 2020, 9:56am

L=f1+f2
where f1 is working with pytorch tensors, and output is a tensor.
and f2 is using some numpy function.

and I am calling L.backward.
I am not getting any error. Also Network is getting the effect of introducing f2 in loss during training.

Is everything ok here…Or some thing fishy.
Thanks.

albanD · July 8, 2020, 3:12pm

Hi,

Since f2 is not computed in a differentiable manner, from the autograd point of view, it is the same as if you were adding a constant number to your loss.
And when taking derivatives, a constant number does not change anything, so you can actually remove f2 from this line and you will see the exact same behavior wrt to the gradients that are computed.

Hassan_Mahmood · July 9, 2020, 3:23am

@albanD Thanks a lot for your insight.
1- What will happen to the following function ,

def f1(tensor1,tensor2):
H=tensor1-tensor2
NUM=((tensor1).data.cpu().numpy()-(tensor2).data.cpu().numpy())**2 # some operation in numpy domain

return H-NUM

L=f1
L.backward()

here Loss is affected by “num” everytime differently depending on tensor1 and tensor2.

2- if f2=NUM , f1=H
and L=f1+f2
is L in step 2 and L in step 1 same in term of training

3- My training is getting positively affected by my original posted question. What can be the reason for that.
Thanks.

albanD · July 9, 2020, 3:13pm

Hi,

When you use .data (which you should never use in general ) you break the “link” with tensor1 and tensor2 from the point of view of the autograd. And so what you do there is ignore for gradient computations.
Here you could replace NUM = 42 and you will get the same result.
no because in 1, you do H - NUM but here you do H + NUM. But otherwise yes they will be the same.
There are gradients flowing to H (because you compute it in a differentiable manner) and so this will influence your training.