CUDA out of memory: why torch.cuda.memory_reserved() so amall?

Dear all,

It seems I can’t get past this anymore (even with multi-gpu) and need to find a solution. Any suggestion will be of great help. Thanks!

I am getting following CUDA memory error:
“RuntimeError: CUDA out of memory. Tried to allocate 720.00 MiB (GPU 0; 39.59 GiB total capacity; 35.53 GiB already allocated; 545.44 MiB free; 37.21 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF”

Question: But I print status message in the code at several places. It seems that “torch.cuda.memory_reserved(0)” is only 2.0 (2.0 MiB…?). Why allocated memory is so small? :open_mouth: Also infi like “35.53 GiB already allocated” and “37.21 GiB reserved in total by PyTorch” are not matching with status message from “torch.cuda.memory_reserved(0)”. (Here I am using only one GPU)

**Here is the status print at different places of my code (till before it throws the error):
===>>> info: 1 – GPU status: total: 40536.1875 // reserved: 0.0 // allocated: 0.0 // free: 0.0
===>>> info: 2 – GPU status: total: 40536.1875 // reserved: 0.0 // allocated: 0.0 // free: 0.0
===>>> info: 3 – GPU status: total: 40536.1875 // reserved: 0.0 // allocated: 0.0 // free: 0.0
===>>> info: 4 – GPU status: total: 40536.1875 // reserved: 0.0 // allocated: 0.0 // free: 0.0
===>>> info: 5 – GPU status: total: 40536.1875 // reserved: 0.0 // allocated: 0.0 // free: 0.0
16384 9
===>>> info: 6 – GPU status: total: 40536.1875 // reserved: 2.0 // allocated: 1.00341796875 // free: 0.99658203125
===>>> info: 7 – GPU status: total: 40536.1875 // reserved: 2.0 // allocated: 1.00341796875 // free: 0.99658203125

**Below is my function that prints GPU status:
def get_gpu_info(gpu_n, info=’’):
mb_in_byte = 1048576.0
t = torch.cuda.get_device_properties(0).total_memory / mb_in_byte
r = torch.cuda.memory_reserved(0) / mb_in_byte
a = torch.cuda.memory_allocated(0) / mb_in_byte
f = r-a # free inside reserved
print(’\n ===>>> info: {} – GPU status: total: {} // reserved: {} // allocated: {} // free: {} '.format(info, t, r, a, f) )

1 Like

Could you post an executable code snippet showing this behavior, please?