When attempting to run a training run for a 7 billion parameter model, I receive this error:
OutOfMemoryError: CUDA out of memory. Tried to allocate 344.00 MiB. GPU 0 has a total capacity of 95.00 GiB of which 238.12 MiB is free. Including non-PyTorch memory, this process has 0 bytes memory in use. Of the allocated memory 92.96 GiB is allocated by PyTorch, and 667.94 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
When running torch.cuda.get_device_properties(0)
, torch returns CudaDeviceProperties(name='GH200 480GB', major=9, minor=0, total* memory=97280MB, multi_processor_count=132)
. Is the LPDDR5 memory not fully coherent with system memory? If so, is there anyway to expose it to pytorch without a custom allocator?