Training works with 1080Ti but don't with 3090. OOM error

Hi there,
I’m training a model for a style transfer app.
I’ve used this training code for 2 years now and it’s perfectly working with a GTX1080Ti 12 Go but unfortunatly it does not work on a RTX3090.
Here is the error

File “train-monet_giverny.py”, line 198, in
train()
File “train-monet_giverny.py”, line 112, in train
content_features = VGG(content_batch.add(imagenet_neg_mean))
File “C:\Users\smartTour\tableaux\lib\site-packages\torch\nn\modules\module.py”, line 1501, in _call_impl
return forward_call(*args, **kwargs)
File “C:\Users\smartTour\Documents\Train-Relook\vgg.py”, line 46, in forward
x = layer(x)
File “C:\Users\smartTour\tableaux\lib\site-packages\torch\nn\modules\module.py”, line 1501, in _call_impl
return forward_call(*args, **kwargs)
File “C:\Users\smartTour\tableaux\lib\site-packages\torch\nn\modules\conv.py”, line 463, in forward
return self._conv_forward(input, self.weight, self.bias)
File “C:\Users\smartTour\tableaux\lib\site-packages\torch\nn\modules\conv.py”, line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 198.00 MiB (GPU 0; 24.00 GiB total capacity; 3.82 GiB already allocated; 17.67 GiB free; 24.00 GiB allowed; 4.00 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

The batch size is 1 and the image size is 800x600. I manage to train on the GTX1080Ti with bigger image size.
The curious thing is that it uses 8 Go to train on the GTX1080Ti but don’t want to allocate more than 4 Gib on the RTX3090.
I’ve tried many version of python or cuda but it changes nothing. I’m using the last version for the driver.
Is there a way to “force” an allocation of 10 Gib ?
Any ideas ?

“CUDA out of memory,” suggests that your GPU is running out of memory during training. This could be due to the RTX 3090’s architecture differences or the way memory is being managed. You can try the following You can try this

  1. Set the environment variable PYTORCH_CUDA_ALLOC_CONF to control the memory allocation behavior in PyTorch. To allocate more memory, you can set the max_split_size_mb configuration option:
import os
os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'max_split_size_mb=4096

This line should be placed at the beginning of your script, before importing PyTorch. It sets the maximum split size to 4096 MB (4 GB). You can increase this value as needed.

Thanks AbdusalamBande,
I’ve tried 4096,1024,2048, 8192, … and I always get the same error

Traceback (most recent call last):
File “train-monet_giverny.py”, line 200, in
train()
File “train-monet_giverny.py”, line 117, in train
generated_features = VGG(generated_batch.add(imagenet_neg_mean))
File “C:\Users\smartTour\tableaux\lib\site-packages\torch\nn\modules\module.py”, line 1501, in _call_impl
return forward_call(*args, **kwargs)
File “C:\Users\smartTour\Documents\Train-Relook\vgg.py”, line 46, in forward
x = layer(x)
File “C:\Users\smartTour\tableaux\lib\site-packages\torch\nn\modules\module.py”, line 1501, in _call_impl
return forward_call(*args, **kwargs)
File “C:\Users\smartTour\tableaux\lib\site-packages\torch\nn\modules\conv.py”, line 463, in forward
return self._conv_forward(input, self.weight, self.bias)
File “C:\Users\smartTour\tableaux\lib\site-packages\torch\nn\modules\conv.py”, line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 200.00 MiB (GPU 0; 24.00 GiB total capacity; 3.99 GiB already allocated; 17.27 GiB free; 4.40 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

It don’t really get how to use all the vram I’ve.

It seems you are limiting the device memory via torch.cuda.set_per_process_memory_fraction since the allowed stats is shown. It points to 24GB, but could you remove it nevertheless?

Also, does:

import torch

print(torch.cuda.memory_allocated()/1024**2)
# 0.0
x = torch.randn(1024**3, device="cuda")
print(torch.cuda.memory_allocated()/1024**2)
# 4096.0
x = torch.randn(2 * 1024**3, device="cuda")
print(torch.cuda.memory_allocated()/1024**2)
# 8192.0

work?

Thanks ptrblck,

I’m not using torch.cuda.set_per_process_memory_fraction

Your code does not work. Always the same error : memory limited at 4Gib… Here is the error :

0.0
4096.0
Traceback (most recent call last):
File “test_memory.py”, line 10, in
x = torch.randn(2 * 1024**3, device=“cuda”)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 8.00 GiB (GPU 0; 24.00 GiB total capacity; 4.00 GiB already allocated; 18.77 GiB free; 4.00 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Your code is using the set_per_process_memory_fraction since the allowed keyword wouldn’t be shown otherwise:

x = torch.randn(1024**4, device="cuda")
# OutOfMemoryError: CUDA out of memory. Tried to allocate 4096.00 GiB (GPU 0; 23.69 GiB total capacity; 0 bytes already allocated; 22.01 GiB free; 0 bytes reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

torch.cuda.set_per_process_memory_fraction(1.0)
x = torch.randn(1024**4, device="cuda")
# OutOfMemoryError: CUDA out of memory. Tried to allocate 4096.00 GiB (GPU 0; 23.69 GiB total capacity; 0 bytes already allocated; 22.00 GiB free; 23.69 GiB allowed; 0 bytes reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Thanks Ptrblck,

Of course, you’re right :wink:
I have tested to put a ‘set_per_process_memory_fraction’ two days ago but since it was not working I had it commented.
I trash the contentof the pycache folder and the ‘allowed’ keyword disappear but not the error

I also try to call

torch.cuda.set_per_process_memory_fraction(1.0, None)

right after the import and now the training is starting :+1:

two days ago I had called it in my training function after some torch settings. It seem s to be the reason it was not working…

I’ll let you know in 24h at the end of the training

Thanks for confirming!

This is really weird. Does it mean you have set it to a specific limit in the past, saw the issue, removed it, and were still unable to allocate more than 4GB?
After removing the pycache (could you describe what exactly you have deleted) you are now able to allocate more than 4GB?

Actually Ihave tried to play with this limit but I have placed it after some torch settings and it was not working. I think this settings was to late in my code. its was after

# Seeds
    torch.manual_seed(SEED)
    torch.cuda.manual_seed(SEED)
    np.random.seed(SEED)
    random.seed(SEED)

    # Device
    device = ("cuda" if torch.cuda.is_available() else "cpu")

Nevertheless my training code has the OOM error without it. It was unable to allocate more than 4Gib. I have tried the torch.cuda.set_per_process_memory_fraction, but not in the right place and I gave up with that and have commented it. But when you write of that this morning I try to put that just after the import as the first setting and it works.
I never met this error with the GTX1080Ti. No need to set the torch.cuda.set_per_process_memory_fraction with this card.

In my code I import three python files with classes and functions

import vgg
import experimental
import utils

I have deleted the three files related experimental.cpython-38.pyc, vgg.cpython-38.pyc and utils.cpython-38.pyc

Actually my code is crashing again for an other reason…I will investigate

OK, let me know if you are still seeing issues disallowing you to allocate more than 4GB of memory, as this setting should not be “sticky” between runs and I would consider it a bug.
I was also experimenting with it in my setup, but wasn’t able to reproduce the issue.

As I said in a previous post I had an other error related to a memory cpuallocator error. When I search about this error I found that increasing the Pagefile size can solve this error.
And I discover that the pagefile size was set to 0 on my computer. Don’t remenber why I have done that but setting it automatically solved my memory cpuAllocator error.
And Today I also try to run my code without the

torch.cuda.set_per_process_memory_fraction(1.0, None)

And it works now. No 4GiB limitations. no OOM error
It seems that I have to have a pagefile size more than 0 to avoid this error on the Vram. I don’t know if it’s already a warning for the users of pytorch but it could be…