How to free GPU memory (Nothing works)

KnHuq · August 15, 2018, 4:43am

Hi @smth , I tried all the discussion and everywhere but can’t find the correct solution with pytorch. I am seeking your help.

How can I free up the memory of my GPU ?

[time 1] used_gpu_memory = 10 MB
[time 2] model = ResNet(Bottleneck, [3, 3, 3, 3],100).cuda()
[time 2] used_gpu_memory = 889 MB
[time 3] del model
[time 4] torch.cuda.empty_cache()
[time 4] used_gpu_memory = 627 MB

I tried gc.collect(). It is also not helping. I am having huge problem during training due to this un-referenced memories.

PTA · August 16, 2018, 7:42pm

Have you tried to terminate your script and remount the GPUs?

KnHuq · August 17, 2018, 12:36am

Yeah. When I terminate the script it frees the memory but the same thing again when I run it. @smth any thought on that? I want to free that redundant memory when I am running the script so @PTA remounting the gpu during the run does not make sense at all.

PTA · August 17, 2018, 10:53pm

I think after torch.cuda.empty_cache() there is a reduced number of remaining memory does make sense. From my experience that should not affect your model training, unless there is a memory leak.

KnHuq · August 18, 2018, 3:31am

That memory is occupied by something. I want to remove that memory as i want to use that memory. May be trying to put some new tensor in the gpu but that space is still occupied.

KnHuq · August 20, 2018, 12:44am

@smth Still waiting for your response.

smth · September 12, 2018, 6:56am

@KnHuq we do occupy some base memory of ~400MB per GPU for cuda context, CUDA RNG context, streams etc. and ~200MB per GPU for CuDNN handles etc.
It depends on GPU model as well. The numbers I quoted are for Volta V100 GPU.

111197 · December 4, 2019, 2:12am

Hi:
if I do ops like:

x = conv(x)
x = conv(x)

I found it would save the very first x even if it has been replaced by conv(x). And it would consume 3 times memory as much as x. Could I free the useless very first x like that?

ptrblck · December 4, 2019, 5:48am

Double post from here. Lets continue the discussion in the other topic.