CUDA 10.2 Out of memory

Tuong_Lam · June 16, 2020, 4:11am

I run a model in GTX 1080 Ti with cuda 10.2 and pytorch 1.5. When I synthesize audio output, I use “with torch.no_grad(), torch.backends.cudnn.deterministic = False, torch.backends.cudnn.benchmark = False, torch.cuda.set_device(0), torch.cuda.empty_cache(), os.system(“sudo rm -rf ~/.nv”)” but GPU memory is still increased. Each time it increase about 10 MiB until out of memory.
Can you help me to solve this problem? Thanks very much.

ptrblck · June 16, 2020, 8:51am

If I understand the issue correctly, your memory usage is increasing in each iteration.
This might happen if you are storing tensors, which are still attached to the computation graph, in e.g. a list.
Often you would like to append the loss to a list in order to calculate the mean for the epoch.
Since the loss tensor is attached to the computation graph, you would also store the complete graph in each iteration, which might eventually yield the OOM issue.

To detach the tensor properly, you could use:

losses.append(loss.cpu().detach().item())

The memory increase should not come from the options you’ve posted.

Tuong_Lam · June 16, 2020, 9:35am

The problem is in synthesizing, not training so I think it does not involve to loss.

ptrblck · June 16, 2020, 6:54pm

Could you post your code, so that we can have a look?
None of the mentioned settings should change the memory behavior and of course shouldn’t create / avoid a memory leak.

Tuong_Lam · June 17, 2020, 1:36am

Here is my code to synthesize audio from mel spectrogram
pad_fn = torch.nn.ReplicationPad1d(self.config_gan[“generator_params”].get(“aux_context_window”, 0)).to(torch.device(“cuda”))
#Generative
with torch.no_grad():
c = self.scaler.transform(mels[0])
x = ()
z = torch.randn(1, 1, len(c ) * self.config_gan[“hop_size”]).to(torch.device(“cuda”))
x += (z,)
b = torch.from_numpy(c ).unsqueeze(0).transpose(2, 1).to(torch.device(“cuda”))
c = pad_fn(b)
x += (c,)
y = self.model_gan(*x).view(-1).cpu().numpy()
y = y[:len(y)-3000]
del z
torch.cuda.empty_cache()
del b
torch.cuda.empty_cache()
del c
torch.cuda.empty_cache()
out = io.BytesIO()
audio.save_wav(y, out, sr=hparams.sample_rate)
del y
torch.cuda.empty_cache()
del pad_fn
torch.cuda.empty_cache()
gc.collect()
with open(‘fmemory.txt’, ‘a’) as f:
f.writelines(‘del all ’ + str(torch.cuda.memory_allocated()/10242)+"\n") #5.27490234375
f.writelines(str(torch.cuda.memory_cached()/10242) +’\n’) #6.0
self.model_gan.remove_weight_norm()
return out.getvalue()

ptrblck · June 17, 2020, 4:58am

Are you able to reproduce the memory increase using random data? If so, could you post the input data shape as well as all other shapes, which would be necessary to reproduce this issue?

Tuong_Lam · June 17, 2020, 6:37am

Here is the memory that I noted when synthesizing a text with 3 sentences.
before inference: allocated 5.10791015625
cached 6.0
inference: allocated 225.40185546875
cached 970.0
after inference allocated 6.48193359375
cached 26.0
del all allocated 6.48193359375
cached 26.0
before inference: allocated 5.10791015625
cached 6.0
inference: allocated 95.818359375
cached 398.0
after inference allocated 5.6748046875
cached 6.0
del all allocated 5.6748046875
cached 6.0
before inference: allocated 5.10791015625
cached 6.0
inference: allocated 31.6435546875
cached 122.0
after inference allocated 5.27490234375
cached 6.0
del all allocated 5.27490234375
cached 6.0

ptrblck · June 17, 2020, 6:51am

The memory doesn’t seem to grow in each iteration or am I missing something?
Depending on the size of your input the current iteration might need more memory than the previous one, but the memory footprint seems to go down as expected after deleting the tensors.

Tuong_Lam · June 17, 2020, 6:56am

Yes, the memory does not change in allocate and cache. But GPU memory usage is still increase until out of memory.

ptrblck · June 17, 2020, 8:27am

How can the GPU yield an OOM, if the allocated and cached memory is reduced in each iteration?
Are you hitting an “extremely large” input sample, which might be too big for your device?

Tuong_Lam · June 18, 2020, 7:38am

Maybe it is a bug of Pytorch 1.5, I change to Pytorch 1.0.1 and it works fine, not leaking GPU RAM as in Pytoch 1.5.

ptrblck · June 18, 2020, 8:33pm

I’m still unsure, how to interpret this statement:

How does the memory increase, if the allocated and cached memory is not changed?
Are you seeing the increase only via e.g. nvidia-smi?

Tuong_Lam · June 19, 2020, 1:09am

Yes, I see the increase in nvidia-smi.

ptrblck · June 19, 2020, 1:11am

Could you post a minimal, executable code snippet, which would show this behavior, so that we could debug it, please?

Tuong_Lam · June 19, 2020, 1:17am

You can see here: https://github.com/kan-bayashi/ParallelWaveGAN/issues/160#issuecomment-639850145

ptrblck · June 19, 2020, 9:50am

The code is unfortunately not executable, as you are using private data.
Could you post an executable code snippet using random input data?