when I run the LLM of int8 of 70B, when I need a 70G video memory program, the system loads about 46G video memory of an A40 graphics card, and another A40 graphics card has not been started, and an error is reported, as follows:
torch.empty(self.output_size,
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 448.00 MiB (GPU 0; 44.42 GiB total capacity; 43.11 GiB already allocated; 426.81 MiB free; 43.11 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Please help me!