Backward very slow

hi, i built a loss function, and the backward is very slow:

 def loss_function(p, y, b):
     eps = 0.0000000001
     losses = 0
     k = len(y[0])
     ones = torch.ones(k).cuda()
     for i in range(b):
         loss1 = -((ones-y[i])@(((ones-p[i])+(eps)).log()))
         prod = (ones-y[i])*k - y[i]*((p[i]+ eps).log())
         loss2 = torch.min(prod)
         losses = losses + (loss2 + loss1)
     return (losses/b)

and also, what should be the ratio between the size of the data, and the size of the batch?
i have ~300,000 rows in my dataset, maybe a better choice of batch size would help here?

my model is a CNN, with 3 conv layers and 2 linear layers

Hi,

The most likely reason is that the for loop that you use in your loss function is slow and creates many small operations in the backward as well.
Maybe you want to change it to work with the whole batch at once?

how?
this is a multi-label problem, so i calculate the distance between each pair of prediction+label in the loop

Assuming that your inputs are 2D, the code below shows how to do it

 import torch

def loss_function(p, y, b):
    eps = 0.0000000001
    losses = 0
    k = len(y[0])
    ones = torch.ones(k)
    for i in range(b):
        loss1 = -((ones-y[i])@(((ones-p[i])+(eps)).log()))
        prod = (ones-y[i])*k - y[i]*((p[i]+ eps).log())
        loss2 = torch.min(prod)
        losses = losses + (loss2 + loss1)
    return (losses/b)

def new_loss_function(p, y, b):
    eps = 0.0000000001
    losses = 0
    k = len(y[0])
    ones = torch.ones(1, 1).expand(b, k)
    loss1 = -((ones-y)*(((ones-p)+(eps)).log())).sum(dim=1)
    prod = (ones-y)*k - y*((p+ eps).log())
    loss2 = torch.min(prod, dim=1)[0]
    losses = (loss1 + loss2).sum()
    return losses / b

b = 10
k = 5

# Are there the right input size?
p = torch.rand(b, k)
y = torch.rand(b, k)

old_res = loss_function(p, y, b)
new_res = new_loss_function(p, y, b)

print((old_res - new_res).abs().max())


1 Like

thank you so much!

just a little question:
why torch.ones(1,1).expand(x,y) and not torch.ones(x,y)?
does it matter?

This is just a small trick to reduce memory use. The expand function does not actually allocate new memory but use tricks on stride to get the final Tensor.
You can use both and should not see any difference in the result. The one with expand will just use a bit less memory.

1 Like

hi, i have one more question about my loss, maybe you could help me

do you now how can i add L2 regularization to thin loss?
tnx!

The simplest way is just to use the weight_decay parameter of the optimizer if you use one from torch.optim.

Otherwise, just compute the l2 by hand and add it to the loss before running the backward.