Understanding memory allocation


I’ve a model which accepts a series of inputs and uses previous predictions as inputs for subsequent predictions. So a batch, for instance, is a python list with a series of inputs and each input in the list can have batch dimension. I’ve an evaluation loop like this with memory allocation printed out after each prediction:

    # compute metrics over the dataset
    for i, batch in enumerate(tqdm(dataloader)):
        if params.cuda:
            batch = net.batch_to_cuda(batch)
        print('memory allocated for model and input batch:', torch.cuda.memory_allocated(0))

        output_batches = []
        targets = []
        losses = []

        # Train on consecutive predictions -- feeding previous predictions as "previous" frames.
        for j, model_in in enumerate(batch):
            # Ground truth

            # Replace neigbor_hd with prior predictions
            for k, out in enumerate(reversed(output_batches)):
                if k < len(model_in['previous']):
                    model_in['previous'][k] = out

            # compute model output and loss
            loss = loss_fn(output_batches[j], targets[j])

            print('memory allocated after pred %d:' % j, torch.cuda.memory_allocated(0))
        total_output_size = 0
        for out in output_batches:
            total_output_size += out.element_size() * out.nelement()
        print('Total output size:', total_output_size)

Since each prediction is an independent inference, and noting the fact that I only indent to use the value of the previous prediction, shouldn’t the memory allocated after each prediction should remain the same? But, I’m seeing progressive allocation of more and more memory:

memory allocated for model and input batch: 198102016
memory allocated after pred 0: 4024618496
memory allocated after pred 1: 7850085888
memory allocated after pred 2: 11675553280
Total output size: 3145728

You have a single autograd graph here, with pending gradient computations for all iterations, hence nothing gets released. loss.backward() inside the loop may help, but you’d have to split graph by .detach() -ing model outputs passed to a next model.

@googlebot You’re right, for training loop, I detached the previous model outputs when using them as next inputs, memory allocated remains the same after each prediction. For evaluation though, apparently I just needed to add this to torch.no_grad() context and memory is freed after each prediction.