RuntimeError: CUDA out of memory. Tried to allocate - Can I solve this problem?

Hello everyone. I am trying to make CUDA work on open AI whisper release. My current setup works just fine with CPU and I use medium.en model

I have installed CUDA-enabled Pytorch on Windows 10 computer however when I try speech-to-text decoding with CUDA enabled it fails due to ram error

RuntimeError: CUDA out of memory. Tried to allocate 70.00 MiB (GPU 0; 4.00 GiB total capacity; 2.87 GiB already allocated; 0 bytes free; 2.88 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I disabled and enabled the graphic card before running the code - thus the VGA ram was 100% empty. Tested on my laptop so it has another GPU as well. Therefore, my GTX 1050 was literally using 0 MB of memory already

My computer also has 32 GB ram and CPU synthesis working very well but just too slow. In 7 hours processed only 1 hour of speech.

My installed version :

Name: torch
Version: 1.12.1+cu116
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: pytorch org
Author: PyTorch Team
Author-email: packages pytorch org
License: BSD-3
Location: c:\python399\lib\site-packages
Requires: typing-extensions
Required-by: whisper, torchvision, torchaudio

Do you have any suggestions to solve this problem?

https://i.imgur.com/2TRloGX.png

The problem could be the GPU memory used from loading all the Kernels PyTorch comes with taking a good chunk of memory, you can try that by loading PyTorch and generating a small CUDA tensor and then check how much memory it uses vs. how much PyTorch says it has allocated.
There has been work to put PyTorch on a bit of a diet there (e.g. the “JITerator”) but I’m not sure about the state, in particular on Windows.

Fun fact: In the olden times, PyTorch would print “Buy more RAM” along with the error message, but then things got all serious. :slight_smile:

Best regards

Thomas

1 Like

Could you give me instructions to run and see? I really don’t know how to do what you are saying :smiley: Can we assign somehow virtual ram to the GPU? I am ok with it being slower than dedicated GPU ram

I started considering purchasing 12 GB RTX 3060 card what you think?

Start Python and do import torch ; a = torch.ones(1, device="cuda") or so. And then check the GPU memory usage (it’s nvidia-smi on Linux, I don’t quite know on windows).

Best regards

Thomas

1 Like

Thank you very much.

Also I just ordered 12 gb RTX 3060 (411$ in Turkey almost equal to my monthly salary)

I hope it can run big model

These GPUs are decidedly not cheap. I feel for you!

Hey, bro. Did you solve this problem after changing your GPU to the one with more memory?

After GPU change it works great
I am daily using it
Here my video : How to do Free Speech-to-Text Transcription Better Than Google Premium API with OpenAI Whisper Model - YouTube

1 Like

Hi @FurkanGozukara, how did you solve the issue of the Cuda memory?

1 Like

To solve the “CUDA out of memory” error, reduce the batch size, use a smaller model, clear unnecessary variables with torch.cuda.empty_cache(), or upgrade your GPU for more memory.

Neither this nor manual garbage collection resolves this in every case. I also see a lot of folks suggesting to decres\ase the batch size. This often does nothing either, I’m not the only one. Secondly, I’ve used the model I’m testing quite a bit, when this issue suddenly cropped up.

I’m not saying you shouldn’t share these steps, but they are in no way a guarantee. Something is amiss with the dependencies, or container, or both.

1 Like

I am having similar issue. I have a 12gb RTX 3060 but it tells me that I have 0.00GB allocated.
I just executed my working code which works from a different GPU and machine. But it does not work on this new GPU and machine.

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.46 GiB. GPU 0 has a total capacity of 12.00 GiB of which 10.98 GiB is free. Of the allocated memory 0 bytes is allocated by PyTorch, and 0 bytes is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.

Update: I also found a problem with the GPU during a different test. I’m not sure how it relates to this issue yet, but when I switched to a new and different GPU brand with the same 3060 12GB, everything worked. I am unable to continue figuring out what went wrong for I have already replaced my GPU.

It still shows an out-of-memory error when I intentionally over-allocate memory, but that’s expected. This is different from the reported issue where allocating only 1.46GB failed.