Memory increases while train

Hello,

I’m trying to understand why my loss increases memory on GPU, I try every solution I found on internet (del, gc.convert()) but still increases…

def customized_loss(pred, target, x):
    '''
    This function takes the output of the model (pred), the target values (target), and the input of the model (x)
    There are two functions:
        - the first computes all jacobians of pred w.r.t x
        - the second calculate the regularizer with the jacobians founded

    Args :
        - pred : batch_size x num_classes matrix, is the output of model for input x
        - taget : batch_size x num_classes matrix, is the target values for x
        - x : batch_size x input_size, is the current batch

    Returns :
        - the customized loss, float
    '''
    def getJacobians(pred, x):
        '''
        This function will calculate the jacobians of the model (pred) w.r.t each instance in x

        Args :
            - pred : batch_size x num_classes matrix, is the output of model for input x
            - x : batch_size x input_size, is the current batch

            Returns :
            - the jacobians : batch_size x input_size x num_classes matrix

        '''
        jacobians = Variable(torch.zeros(batch_size, input_size, num_classes)).type(dtype)
        for x_ in range(batch_size):
            for jrc in range(len(jacobian_rows_construct)):
                pred[x_].backward(torch.Tensor([jacobian_rows_construct[jrc]]).type(dtype), create_graph=True)
                jacobians[x_, :, jrc] = x.grad[x_]
                x.grad.data.zero_()
        net.zero_grad()
        return jacobians

    def regularizer(jacobians):
        '''
        This function will calculate the regularizer, only using tensor operations,
        it'll use the indices founded before and the value of similarity for each indices.

        First using gather we keep all derivatives we need, the we compute all norms,
        we multiply by our similarity vector and we sum all elements

        Args :
        - jacobians : 3D matrix, contains all jacobians for the current batch_size

        Returns :
        - the regularizer, float
        '''
        norms = torch.norm(jacobians.gather(1, ind_i)-jacobians.gather(1, ind_j),2,2)
        return (norms*similarities).sum()

    # calculate loss
    cost = criterion(pred, target)
    # calculate regularizer
    reg = regularizer(getJacobians(pred, x))

    # put them together and return
    return cost + jac.sum()

I think the problem comes from getJacobians but I don’t see where… In my trainig loop I have put del loss, outputs, batch_x, batch_y just after loss.backward().

Hey,
I’m having the same issue. I have to compute jacobians many times in loop.
Individual call to calculate jacobian is fast and not expensive, but when I use it in loop it slows down with every call even further.

Did you solve this problem by any chance?