Dear all,
It seems I can’t get past this anymore (even with multi-gpu) and need to find a solution. Any suggestion will be of great help. Thanks!
I am getting following CUDA memory error:
“RuntimeError: CUDA out of memory. Tried to allocate 720.00 MiB (GPU 0; 39.59 GiB total capacity; 35.53 GiB already allocated; 545.44 MiB free; 37.21 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF”
Question: But I print status message in the code at several places. It seems that “torch.cuda.memory_reserved(0)” is only 2.0 (2.0 MiB…?). Why allocated memory is so small? Also infi like “35.53 GiB already allocated” and “37.21 GiB reserved in total by PyTorch” are not matching with status message from “torch.cuda.memory_reserved(0)”. (Here I am using only one GPU)
**Here is the status print at different places of my code (till before it throws the error):
===>>> info: 1 – GPU status: total: 40536.1875 // reserved: 0.0 // allocated: 0.0 // free: 0.0
===>>> info: 2 – GPU status: total: 40536.1875 // reserved: 0.0 // allocated: 0.0 // free: 0.0
===>>> info: 3 – GPU status: total: 40536.1875 // reserved: 0.0 // allocated: 0.0 // free: 0.0
===>>> info: 4 – GPU status: total: 40536.1875 // reserved: 0.0 // allocated: 0.0 // free: 0.0
===>>> info: 5 – GPU status: total: 40536.1875 // reserved: 0.0 // allocated: 0.0 // free: 0.0
16384 9
===>>> info: 6 – GPU status: total: 40536.1875 // reserved: 2.0 // allocated: 1.00341796875 // free: 0.99658203125
===>>> info: 7 – GPU status: total: 40536.1875 // reserved: 2.0 // allocated: 1.00341796875 // free: 0.99658203125
**Below is my function that prints GPU status:
def get_gpu_info(gpu_n, info=’’):
mb_in_byte = 1048576.0
t = torch.cuda.get_device_properties(0).total_memory / mb_in_byte
r = torch.cuda.memory_reserved(0) / mb_in_byte
a = torch.cuda.memory_allocated(0) / mb_in_byte
f = r-a # free inside reserved
print(’\n ===>>> info: {} – GPU status: total: {} // reserved: {} // allocated: {} // free: {} '.format(info, t, r, a, f) )