I am running pytorch on docker: [2.1.2-cuda11.8-cudnn8-devel].
I was trying to run the training script from GitHub - xg-chu/CrowdDet, and got the following error:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 62.00 MiB. GPU 0 has a total capacty of 2.00 GiB of which 0 bytes is free. Including non-PyTorch memory, this process has 17179869184.00 GiB memory in use. Of the allocated memory 967.91 MiB is allocated by PyTorch, and 76.09 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
nvidia-smi:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.51 Driver Version: 511.69 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:02:00.0 Off | N/A |
| N/A 49C P0 N/A / N/A | 0MiB / 2048MiB | 3% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 22 G /Xwayland N/A |
| 0 N/A N/A 33 G /Xwayland N/A |
| 0 N/A N/A 34 G /Xwayland N/A |
+-----------------------------------------------------------------------------+
-
I’m using a Asus zenbook laptop which has a NVIDIA® GeForce® MX250
2GB GDDR5, is that why the GPU is capped at 2GiB? -
The part where the error says “Including non-PyTorch memory, this process has 17179869184.00 GiB memory in use.”, is it normal to have such a big process?
-
How do I tell how much GPU I need to train this model?