Where is all the memory going?

The following error message is confusing. If I have 22Gb of total capacity and only 6 Mb free, how can I check where the rest is going?

OutOfMemoryError: CUDA out of memory. Tried to allocate 24.00 MiB (GPU 0; 21.99 GiB total capacity; 1.04 GiB already allocated; 6.12 MiB free; 1.18 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Here is my code:

model = BertModel.from_pretrained('bert-base-uncased')

# Check for GPU availability and set the device accordingly
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

model = model.to(device)

# Cleanup
model.eval()

batch_size = 50
num_batches = len(tokenized_dataset["input_ids"]) // batch_size + (1 if len(tokenized_dataset["input_ids"]) % batch_size != 0 else 0)

# Create an empty list to store the output
results = []

# Forward pass
with torch.no_grad(): 
    for i in range(num_batches):
        start = i * batch_size
        end = start + batch_size
        
        input_ids = torch.tensor(tokenized_dataset["input_ids"])[start: end].to(device)
        attention_mask = torch.tensor(tokenized_dataset["attention_mask"])[start: end].to(device)

        # feed the batch into the model
        output = model(input_ids, attention_mask= attention_mask)

        # Try to free up some memory
        del input_ids
        del attention_mask
        torch.cuda.empty_cache() 

        # appending the output to the results list
        results.append(output)

Struggling a bit to understand your question. Are you asking about the remaining 6mb or where the 22gb is going?

To monitor your vram usage in real time, nvitop is a good option.

PyTorch profilers are obviously better but will need a bit of time investment.

1 Like

Thanks thats helpful!

I would expect 24.00 MiB not to be a problem because I have 22 Gb total capacity. already allocated + free + reserved in total by PyTorch only account for 2 Gb.

My question is: what is happening with the remaining 20 Gb and why can’t I use them to model inference?

Check other processes and the general GPU utilization in nvidia-smi to see who is using the memory.