2 Nvidia A40,but show error "torch.cuda.OutOfMemory"Error

dongkuang · November 17, 2023, 3:21am

when I run the LLM of int8 of 70B, when I need a 70G video memory program, the system loads about 46G video memory of an A40 graphics card, and another A40 graphics card has not been started, and an error is reported, as follows:

torch.empty(self.output_size,
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 448.00 MiB (GPU 0; 44.42 GiB total capacity; 43.11 GiB already allocated; 426.81 MiB free; 43.11 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Please help me!

ptrblck · November 17, 2023, 6:17pm

Reduce the memory requirement by e.g. decreasing the batch size, via checkpointing, CPU-offloading etc.

dongkuang · November 19, 2023, 4:29am

Thank you!But why can’t we solve this problem by working together with two A40 graphics cards?And I have installed nvlink,But the result is still the same!

ptrblck · November 19, 2023, 3:21pm

You can use both GPUs, but would need to implement it explicitly e.g. via pipeline parallel approaches as PyTorch won’t shard the model automatically.