Sending tensor to cuda commits lots of memory that is never released

Chris_Oosthuizen · February 10, 2020, 6:13am

On empty jupyter notebook

Current memory use 2.6GiB of 7.7 GiB as seen on Ubuntu System Monitor

a = torch.tensor(1).cuda(0)

After running the command above memory use jumps to 4.7GiB

I’m not able to release this memory other than restarting the kernel. Tried del a, working on cuda in a function and torch.cuda.empty_cuda() but these do not work.

This only happens the first time I send tensor to cuda. To me this use of memory is ironic as I’m working on GPU and it limits how much I can send to cuda as the RAM committed does grow when batch sizes grow.

Working with tensor on .cpu() has none of these memory commitment issues.

My questions are:

Why is so much memory being committed when sending tensor with single INT tensor to GPU?
How can I release this memory again without killing the kernel?

ptrblck · February 10, 2020, 6:42am

Sending a tensor to the GPU should not allocate that much system RAM.
Could you post some information about your setup?
I.e. which GPU are you using as well as PyTorch version, how you’ve installed it (built from source or binaries), as well as the local CUDA and cudnn versions, if installed.

I suspect some just-in-time compilation might be going on in the background.
How long does this command take when you first run it?

Chris_Oosthuizen · February 10, 2020, 8:03am

It takes a second or so to run the first time.

Linux-x86_64
NVidia Driver 430.64
GeForce RTX 2070
torchvision 0.3.0 py37_cu10.0.130_1 pytorch installed with conda as part of fastai install.
Cuda compilation tools, release 10.1, V10.1.105
Having trouble verifying version of CudNN

ptrblck · February 10, 2020, 8:21am

Could you post the PyTorch version via print(torch.__version__)?

Chris_Oosthuizen · February 10, 2020, 5:56pm

Thank you for your time on this

1.1.0

ptrblck · February 10, 2020, 5:58pm

Could you update to the latest stable release and rerun the code, please?

Chris_Oosthuizen · February 10, 2020, 8:22pm

I was able to get 1.3.1 on jupyter notebook and 1.4.1 to work with python3 in terminal with different conda environments.
The memory still goes up from 2.7 to 4.7GiB after running the code

ptrblck · February 11, 2020, 3:49am

Could you run perf top while executing this line of code and check for any ptx calls?

Chris_Oosthuizen · February 12, 2020, 1:14pm

I’ve managed to install perf top, but am not sure how to check for ptx calls.
On running it I can see libcuda.so.430.64 dominating with 15% overhead when I run the .cuda(0) command for the first time

Chris_Oosthuizen · March 4, 2020, 6:59am

@ptrblck does the above answer your question? Sorry to bump. Still haven’t been able to resolve my issue.