Backward very slow

netaglazer · March 18, 2020, 9:47am

hi, i built a loss function, and the backward is very slow:

 def loss_function(p, y, b):
     eps = 0.0000000001
     losses = 0
     k = len(y[0])
     ones = torch.ones(k).cuda()
     for i in range(b):
         loss1 = -((ones-y[i])@(((ones-p[i])+(eps)).log()))
         prod = (ones-y[i])*k - y[i]*((p[i]+ eps).log())
         loss2 = torch.min(prod)
         losses = losses + (loss2 + loss1)
     return (losses/b)

and also, what should be the ratio between the size of the data, and the size of the batch?
i have ~300,000 rows in my dataset, maybe a better choice of batch size would help here?

my model is a CNN, with 3 conv layers and 2 linear layers

albanD · March 18, 2020, 3:06pm

Hi,

The most likely reason is that the for loop that you use in your loss function is slow and creates many small operations in the backward as well.
Maybe you want to change it to work with the whole batch at once?

netaglazer · March 18, 2020, 4:10pm

how?
this is a multi-label problem, so i calculate the distance between each pair of prediction+label in the loop

albanD · March 18, 2020, 4:29pm

Assuming that your inputs are 2D, the code below shows how to do it

 import torch

def loss_function(p, y, b):
    eps = 0.0000000001
    losses = 0
    k = len(y[0])
    ones = torch.ones(k)
    for i in range(b):
        loss1 = -((ones-y[i])@(((ones-p[i])+(eps)).log()))
        prod = (ones-y[i])*k - y[i]*((p[i]+ eps).log())
        loss2 = torch.min(prod)
        losses = losses + (loss2 + loss1)
    return (losses/b)

def new_loss_function(p, y, b):
    eps = 0.0000000001
    losses = 0
    k = len(y[0])
    ones = torch.ones(1, 1).expand(b, k)
    loss1 = -((ones-y)*(((ones-p)+(eps)).log())).sum(dim=1)
    prod = (ones-y)*k - y*((p+ eps).log())
    loss2 = torch.min(prod, dim=1)[0]
    losses = (loss1 + loss2).sum()
    return losses / b

b = 10
k = 5

# Are there the right input size?
p = torch.rand(b, k)
y = torch.rand(b, k)

old_res = loss_function(p, y, b)
new_res = new_loss_function(p, y, b)

print((old_res - new_res).abs().max())

netaglazer · March 18, 2020, 8:50pm

thank you so much!

just a little question:
why torch.ones(1,1).expand(x,y) and not torch.ones(x,y)?
does it matter?

albanD · March 18, 2020, 9:11pm

This is just a small trick to reduce memory use. The expand function does not actually allocate new memory but use tricks on stride to get the final Tensor.
You can use both and should not see any difference in the result. The one with expand will just use a bit less memory.

netaglazer · March 23, 2020, 9:40am

hi, i have one more question about my loss, maybe you could help me

do you now how can i add L2 regularization to thin loss?
tnx!

albanD · March 23, 2020, 3:22pm

The simplest way is just to use the weight_decay parameter of the optimizer if you use one from torch.optim.

Otherwise, just compute the l2 by hand and add it to the loss before running the backward.