CUDA out of memory , empty_cache returns ">"

Trying to train, I get
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1002.00 MiB. GPU 0 has a total capacity of 15.69 GiB of which 788.88 MiB is free.

Recommended answer is
torch.cuda.empty_cache()

All I get when I enter this command is
A greater-than cursor

Ubuntu
Using Meta-Llama-3-8B-Instruct and Oogabooga to train.
16G GPU - nothing signifigant should be running.

NVIDIA-SMI 550.107.02 Driver Version: 550.107.02 CUDA Version: 12.4 |
|-----------------------------------------±-----------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4070 … Off | 00000000:01:00.0 On | N/A |
| 0% 35C P8 7W / 285W | 15273MiB / 16376MiB | 3% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+

±----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 2501 G /usr/lib/xorg/Xorg 195MiB |
| 0 N/A N/A 2884 C+G …libexec/gnome-remote-desktop-daemon 211MiB |
| 0 N/A N/A 2937 G /usr/bin/gnome-shell 26MiB |
| 0 N/A N/A 3807 G /opt/teamviewer/tv_bin/TeamViewer 16MiB |
| 0 N/A N/A 4551 G …AAAAAAAACAAAAAAAAAA= --shared-files 25MiB |
| 0 N/A N/A 4646 G /usr/bin/nautilus 27MiB |
| 0 N/A N/A 5920 G …seed-version=20240913-050142.817000 85MiB |
| 0 N/A N/A 14891 C python 14652MiB |

Suggestions appreciated.

I’m unsure I understand your question correctly. Are you in a Python REPL and are just seeing the >>> markers waiting for your input or is this call printing it?

I typed it on a Linux command line.
I’m just getting started and only have a glimmer of what I’m doing. Should I be inside an environment first?

I’m not sure where exactly you are seeing this output, but in any case calling empty_cache() won’t find the OOM issue since PyTorch will try to release the cache and reallocate the memory internally first before raising the OOM error to the user. You might this need to decrease the overall memory usage e.g. by reducing the batch size of your training.