Debugging "CUDA out of memory"

Hello everyone. I suspect I coded something wrong because I am using a Tesla P100-PCIE-16 B from colab and the tensors i am generating are not that big (10000x100x100). So I think somewhere something is accumulating. I trimmed the code so a lot of variables are not defined but I don’t think that matters much

    for j in range(100):
        
        f_old=f[...,-1]
        ...
        X=# This input is of dimension 10000x 100 x102
        sigma=net(X)

        alpha= torch.sum(sigma * torch.cumsum(sigma *dt,1),-1)
        W=torch.randn(10000,1).unsqueeze(1)*torch.sqrt(torch.tensor(dt))
        W=W.to(device)
        f_new=f_old[:,j:] +  (alpha *dt + torch.sum(sigma*W,-1))
        f_new=torch.cat([torch.zeros(MC,j,device=device),f_new],1)
        f[...,j+1]=f_new   
        del X , W, f_new,f_old, x_mat, x_input_t, x_input_2 , alpha, sigma

I get the error message around j=50. Any Idea where the problem with my code is?

Depending on the used model the input might be alright or huge. Given that you are able to use ~50 iterations before the OOM is raised, I would recommend to check, if the memory usage is increasing inside the loop. If that’s the case you might (accidentally) store tensors attached to the computation graph and could try to isolate it by calling detach() on tensors.