A strange problem I meet when define a new loss function

KoalaSheep · November 8, 2019, 10:36am

Hi, everybody, I met a strange problem when trying to define a weighted cross entropy myself.

My loss func is simple:

def weighted_BCE(sigmoid_x, targets):                  
#this is a weighted Binary Cross Entropy
    assert sigmoid_x.size() == targets.size()           
#make sure input and label have same shape
    count_p = (targets == 1.0).sum() + 1                 
    count_n = (targets == 0.0).sum() + 1                
#count 1(positive) and 0(negative), add 1 to avoid 0
    loss = -((targets * sigmoid_x.log()) * (1 / count_p)) - (((1 - targets) * (1 - sigmoid_x).log()) * (1 / count_n))
#divide the positive part and negative part with their count respectively
    return loss.mean()

then try it with a random input and all 0 label

if __name__ == '__main__':
    a = np.random.uniform(low = 0.0, high = 1.0, size = (3, 3))
    a = torch.from_numpy(a).float().to('cuda:0')
    a.requires_grad = True
    b = np.zeros((3, 3))
    b = torch.from_numpy(b).float().to('cuda:0')
    b.requires_grad = False
    loss = weighted_BCE(a, b)
    loss2 = torch.mean(-((b * a.log()) * (1 / 1)) - (((1 - b) * (1 - a).log()) * (1 / 10)))
# calculate the loss in the same way without definition
    print(loss.item())
    print(loss2.item())
    print((loss - loss2).item())

# here is the result
0.0
0.15303052961826324
-0.15303052961826324 #this is the supposed to be zero

It may be a silly question, but I can’t figure out the reason why the loss func returns 0.0 in a definition and different result when calculate it in the main func.

My environment is

Os : Ubuntu 18.04.2 LTS
GPU: GTX 1080ti
pytorch: 1.2.0

Nikronic · November 8, 2019, 11:20am

Hi,

The logic you using is correct, actually the data types are causing this differences.
For instance, if you use your own code in functions:

count_p = (b == 1.0).sum() + 1
count_n = (b == 0.0).sum() + 1                
loss2 = torch.mean(-((b * a.log()) * (1 / count_p)) - (((1 - b) * (1 - a).log()) * (1 / count_n)))

count_p instead of 1 and count_n instead of 10, you will get same result. Note if you add 1. or 10., you will get different result.

So the most secure way to use constant numbers is to use torch.tensor() method.

loss2 = torch.mean(-((b * a.log()) * (1 / torch.tensor([1]).cuda())) - (((1 - b) * (1 - a).log()) * (1 / torch.tensor([10]).cuda())))

torch.tensor() retains the datatype while moves tensor to the given device.

Bests
Nik

KoalaSheep · November 11, 2019, 3:28am

Thank you and sorry for my late response. In fact what I want is a little different.
Based on your advice, I change the

loss = -((targets * sigmoid_x.log()) * (1 / count_p)) - (((1 - targets) * (1 - sigmoid_x).log()) * (1 / count_n))

in my loss def to

loss = -((targets * sigmoid_x.log()) * (1 / count_p.float())) - (((1 - targets) * (1 - sigmoid_x).log()) * (1 / count_n.float()))

and it gives me the right result.

But I am still unsure about the int and float type here.
As count_p is a int type and sigmoid_x.log() is a float type, float multiply a int type should result in a float type, isn’t it? Or is there anything different with tensor?

Nikronic · November 11, 2019, 6:32pm

Actually, You are right about the casting to float. But something I am not sure about is the system dependent type sizes or even other parameters. I tried using 1. to convert to python default float type and it did not work. I do not know much about C++ (backend of PyTorch) and Python variable system and type checking, so that’s why I suggest using torch.tensor() because it automatically handle type conversion of retaining current type of numpy ndarray object as argument.
I did not check numpy float type number to check the answer because in the docs it says PyTorch tensors and numpy ndarrays can be converted to each other seamlessly. (Using sole constant number is python pure not numpy.)
For instance, numpy has system dependent int and independent version too. There all can effect and an interface is needed to handle this situations which I think torch.tensor is doing it.

Something else I thing need to be checked is that GPU is considering different sizes rather than normal code or not.

My knowledge is very basic about these topics.
I think you can create new question and demonstrating different answers regarding using python pure constant number or numpy array or even the tensor. PyTorch dev team is really good and almost all the time answer the question.

Bests

KoalaSheep · November 12, 2019, 3:43am

Thank you for your kindness and patience. I will take your advice.

edit:
just found something alike on github.