Unable to allocate cuda memory, when there is enough of cached memory

ptrblck · April 5, 2021, 11:06pm

If you are sure that you don’t need the process, you could try to kill it, but please make sure it’s not a valid process.

torch.cuda.empty_cache() shouldn’t help, as it would only empty the CUDA memory cache, which would then trigger expensive cudaMalloc calls and would thus slow down your code.

huanyu_zang · April 6, 2021, 12:07am

It just stops the code what I’m running, right? It will not change my code or delete my initial dataset.

Could you please tell me detailedly on how to release my memory to avoid such issue before every time I run a new project? eg. nvidia-smi

ptrblck · April 6, 2021, 5:50am

It depends on the process you are stopping. If the GPU is used to visualize your desktop, this process might be needed unless you are working on a server etc. Killing a process (especially with -9) won’t might result in data losses, as the process might not have a chance to save it’s work and you should be careful using it.

nvidia-smi will show you the used memory on the device (and the processes using the memory, if possible). If you don’t need these processes, you could close them to save memory, but it depends on your system and the used processes.

Martin_Pflaum · June 25, 2021, 1:27pm

Hi i got a similar problem - i fresh restarted the pc and was only able to allocate 3 GB of my 8 GB Nvidia Gpu.

So for me it worked to kick out everything in the autorun in windows - something like steam java ect.

also having an ubuntu pc would probably work well -never had these kind of problems with my ubuntu pc

Jagadeesh_Chikkala · October 12, 2021, 5:44am

I solved this problem by increasing bach size

praggya · October 23, 2021, 1:21am

@ptrblck @smth ,i am working with 2 3090 GPUs,then also I don’t know why its showing OOM error,an nvidia-smi shows this!

praggya · October 23, 2021, 1:22am

This is the exact error:RuntimeError: CUDA out of memory. Tried to allocate 4.00 MiB (GPU 0; 23.70 GiB total capacity; 18.06 GiB already allocated; 5.56 MiB free; 838.00 KiB cached)

praggya · October 23, 2021, 1:29am

By the way this is shown when the training terminated at the Runtime-error

ptrblck · October 23, 2021, 1:52am

Could you post a minimal, executable code snippet, which would reproduce the issue, i.e. running out of memory while the the GPU still has enough memory pages, please?

praggya · October 23, 2021, 2:47am

Sorry I cannot share the exact code,but somewhere in torch.autograd i had used retain_graph=True ,will that affect it?

praggya · October 23, 2021, 2:50am

@ptrblck ,please check

praggya · October 23, 2021, 2:54am

Inspite of the process being terminated nearly 4GB of memory on each GPU is occupied

ptrblck · October 23, 2021, 6:39am

Yes, it can affect it, as you might be increasing the memory usage in each iteration by keeping the computation graph.

praggya · October 23, 2021, 12:03pm

@ptrblck ,but without using retain_graph was giving None for the grad of some variables

praggya · October 23, 2021, 12:04pm

Btw,it ran OOM before even running for one iteration

praggya · October 23, 2021, 12:06pm

I restarted the training by kill all PIDS which were occupying GPU Memory but it didn’t help

Experiment dir : search-EXP-ab1–20211023-041336
10/23 04:13:36 AM gpu device = 0,1
10/23 04:13:36 AM args = Namespace(arch_learning_rate=0.0003, arch_weight_decay=0.001, batch_size=8, cutout=False, cutout_length=16, data=’/voyager-volume/code_1_test/Original_images’, drop_path_prob=0.3, epochs=50, gpu=‘0,1’, grad_clip=5, init_channels=16, is_parallel=1, layers=8, learning_rate=0.025, learning_rate_feature_extractor=0.025, learning_rate_head_g=0.025, learning_rate_min=0.001, model_path=‘saved_models’, momentum=0.9, num_classes=31, report_freq=50, save=‘search-EXP-ab1–20211023-041336’, seed=2, source=‘amazon’, target=‘dslr’, train_portion=0.5, unrolled=False, weight_decay=0.0003, weight_decay_fe=0.0003, weight_decay_hg=0.0003)

10/23 04:55:44 AM param size = 0.297522MB
10/23 04:55:44 AM epoch 0 lr 2.500000e-02
10/23 04:55:44 AM genotype = Genotype(normal=[(‘max_pool_3x3’, 1), (‘skip_connect’, 0), (‘max_pool_3x3’, 0), (‘skip_connect’, 2), (‘dil_conv_5x5’, 2), (‘dil_conv_5x5’, 0), (‘sep_conv_3x3’, 1), (‘dil_conv_3x3’, 4)], normal_concat=range(2, 6), reduce=[(‘avg_pool_3x3’, 1), (‘sep_conv_3x3’, 0), (‘dil_conv_3x3’, 1), (‘dil_conv_3x3’, 0), (‘skip_connect’, 2), (‘max_pool_3x3’, 1), (‘max_pool_3x3’, 3), (‘dil_conv_3x3’, 2)], reduce_concat=range(2, 6))
tensor([[0.1250, 0.1250, 0.1247, 0.1251, 0.1250, 0.1251, 0.1250, 0.1251],
[0.1250, 0.1252, 0.1249, 0.1250, 0.1251, 0.1251, 0.1248, 0.1249],
[0.1249, 0.1253, 0.1250, 0.1249, 0.1249, 0.1250, 0.1250, 0.1249],
[0.1251, 0.1249, 0.1250, 0.1251, 0.1250, 0.1249, 0.1250, 0.1251],
[0.1249, 0.1247, 0.1250, 0.1252, 0.1249, 0.1251, 0.1251, 0.1250],
[0.1249, 0.1250, 0.1250, 0.1251, 0.1250, 0.1250, 0.1248, 0.1253],
[0.1251, 0.1250, 0.1250, 0.1251, 0.1248, 0.1251, 0.1250, 0.1249],
[0.1252, 0.1250, 0.1249, 0.1249, 0.1250, 0.1249, 0.1251, 0.1251],
[0.1250, 0.1250, 0.1251, 0.1251, 0.1251, 0.1250, 0.1249, 0.1248],
[0.1250, 0.1252, 0.1249, 0.1250, 0.1251, 0.1249, 0.1248, 0.1251],
[0.1249, 0.1249, 0.1250, 0.1250, 0.1252, 0.1250, 0.1250, 0.1250],
[0.1251, 0.1249, 0.1249, 0.1250, 0.1249, 0.1252, 0.1251, 0.1251],
[0.1250, 0.1251, 0.1251, 0.1250, 0.1250, 0.1251, 0.1249, 0.1249],
[0.1251, 0.1247, 0.1249, 0.1251, 0.1252, 0.1249, 0.1253, 0.1249]],
device=‘cuda:0’, grad_fn=)
tensor([[0.1252, 0.1251, 0.1250, 0.1249, 0.1251, 0.1249, 0.1249, 0.1250],
[0.1249, 0.1248, 0.1251, 0.1250, 0.1250, 0.1251, 0.1251, 0.1250],
[0.1251, 0.1249, 0.1249, 0.1251, 0.1249, 0.1250, 0.1251, 0.1250],
[0.1251, 0.1249, 0.1249, 0.1250, 0.1250, 0.1249, 0.1251, 0.1251],
[0.1249, 0.1251, 0.1248, 0.1250, 0.1250, 0.1250, 0.1251, 0.1251],
[0.1251, 0.1248, 0.1251, 0.1251, 0.1250, 0.1249, 0.1250, 0.1250],
[0.1250, 0.1251, 0.1250, 0.1251, 0.1249, 0.1249, 0.1250, 0.1249],
[0.1249, 0.1251, 0.1251, 0.1252, 0.1249, 0.1251, 0.1248, 0.1248],
[0.1252, 0.1250, 0.1250, 0.1251, 0.1247, 0.1249, 0.1252, 0.1250],
[0.1251, 0.1247, 0.1250, 0.1251, 0.1249, 0.1251, 0.1250, 0.1252],
[0.1250, 0.1249, 0.1249, 0.1251, 0.1250, 0.1252, 0.1250, 0.1249],
[0.1249, 0.1251, 0.1249, 0.1250, 0.1251, 0.1250, 0.1251, 0.1249],
[0.1250, 0.1252, 0.1247, 0.1247, 0.1249, 0.1252, 0.1250, 0.1251],
[0.1250, 0.1252, 0.1250, 0.1251, 0.1250, 0.1249, 0.1248, 0.1250]],
device=‘cuda:0’, grad_fn=)
/opt/conda/lib/python3.6/site-packages/torch/tensor.py:292: UserWarning: non-inplace resize_as is deprecated
warnings.warn(“non-inplace resize_as is deprecated”)
Traceback (most recent call last):
File “train.py”, line 363, in
main()
File “train.py”, line 181, in main
train_acc, train_obj = train(source_train_loader,source_val_loader,target_train_loader,target_val_loader, criterion,optimizer,optimizer_fe, optimizer_hg,lr,feature_extractor,head_g,model,architect,args.batch_size)
File “train.py”, line 219, in train
_,domain_logits=model(input_img_source)
File “/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 489, in call
result = self.forward(*input, **kwargs)
File “/voyager-volume/code_1_test/code_1_test/code_1_test/code_1_test/code_1_test/model_search.py”, line 159, in forward
s0, s1 = s1, cell(s0, s1, weights,weights2)
File “/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 489, in call
result = self.forward(*input, **kwargs)
File “/voyager-volume/code_1_test/code_1_test/code_1_test/code_1_test/code_1_test/model_search.py”, line 85, in forward
s = sum(weights2[offset+j]*self._ops[offset+j](h, weights[offset+j]) for j, h in enumerate(states))
File “/voyager-volume/code_1_test/code_1_test/code_1_test/code_1_test/code_1_test/model_search.py”, line 85, in
s = sum(weights2[offset+j]*self._ops[offset+j](h, weights[offset+j]) for j, h in enumerate(states))
File “/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 489, in call
result = self.forward(*input, **kwargs)
File “/voyager-volume/code_1_test/code_1_test/code_1_test/code_1_test/code_1_test/model_search.py”, line 44, in forward
temp1 = sum(w * op(xtemp) for w, op in zip(weights, self._ops))
RuntimeError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 0; 23.70 GiB total capacity; 22.83 GiB already allocated; 2.56 MiB free; 523.00 KiB cached)

ptrblck · October 24, 2021, 4:06am

As the error message explains, you have already allocated almost all of the GPU memory and won’t be able to allocate more.
Without a code snippet for debugging I’m not able to debug anything further.
Since you cannot share the code, try to check the memory usage in your script at different places and check which code snippets allocate this (apparently unexpected) memory.