when I ran llama3 on rtx 3080, the 3080 card has only 10GB memory, the model need 16GB, it fails
self.weight = Parameter(torch.Tensor(self.output_size_per_partition, self.in_features))
[rank0]: torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 112.00 MiB. …
However , when I ran some others like stable diffusion image generation or video generation, of which the underlying component is also pytorch. It will use the virtual memory and can alloc much more memory than dedicate memory on the GPU card.
All on the windows, why?
Is there any configuration?