hi, i built a loss function, and the backward is very slow:

def loss_function(p, y, b):
eps = 0.0000000001
losses = 0
k = len(y[0])
ones = torch.ones(k).cuda()
for i in range(b):
loss1 = -((ones-y[i])@(((ones-p[i])+(eps)).log()))
prod = (ones-y[i])*k - y[i]*((p[i]+ eps).log())
loss2 = torch.min(prod)
losses = losses + (loss2 + loss1)
return (losses/b)

and also, what should be the ratio between the size of the data, and the size of the batch?
i have ~300,000 rows in my dataset, maybe a better choice of batch size would help here?

my model is a CNN, with 3 conv layers and 2 linear layers

The most likely reason is that the for loop that you use in your loss function is slow and creates many small operations in the backward as well.
Maybe you want to change it to work with the whole batch at once?

This is just a small trick to reduce memory use. The expand function does not actually allocate new memory but use tricks on stride to get the final Tensor.
You can use both and should not see any difference in the result. The one with expand will just use a bit less memory.