+ 50% VRAM used on torch 1.3 compared to 1.2

hadaev8 · December 8, 2019, 1:37pm

1.3
https://colab.research.google.com/drive/104LtQ1zIioIOMQEPgVve77m5Rd4Gm0wU
1.2
https://colab.research.google.com/drive/1y4LF1a90PYfKvFgQw6fCMYKtIFK07mVr
Everything same except torch version.
At the end nvidia-smi show 11gb and 7 gb memory used.

albanD · December 9, 2019, 8:49pm

Why are there all these tensorflow warnings? Do you use it as well?
There are many other files imported. Are you sure that you don’t have a difference in model?

hadaev8 · December 9, 2019, 9:18pm

Tensorflow warnings from hparams contrib object.
Codebase exact same, only difference is pytorch version.

albanD · December 9, 2019, 9:22pm

Could you try reducing the code to a small 30/40 lines of code? It is very hard to give an answer here as we don’t even see all the files

hadaev8 · December 9, 2019, 9:32pm

Okay.
This is copy past of tutorial.
https://colab.research.google.com/drive/10YhV2QqG-pZLXbsTkJRE-hVcXtmxzBXS
https://colab.research.google.com/drive/1IDTOlw3U3ZzCvRAirpISc7UBG8M88oZU
700mb vs 330mb

albanD · December 9, 2019, 10:43pm

Rerunning both notebooks above from scratch give ~700MB usage. If you rerun your notebook above from scratch you still see this?

hadaev8 · December 10, 2019, 1:23am

Looks like it is related to type of gpu, on k80 both versions take ~300mb, on p4 535 vs 595, on p100 715mb vs 777mb. So this example is invalid.
Still, top example was made on same gpu.
Maybe it is about rnn layers or zero padding, i will try to check later.

albanD · December 10, 2019, 3:35pm

Hi,

So if you use different GPUs, I could think of few reasons:

The cuda driver behaves differently
Different architecture lead to different size of code loaded on the device by the driver
The algorithms chosen to do the compute by cudnn are different and use different amount of memory.

You can check how much memory is used when only doing torch.rand(1).cuda(). That will tell you how much memory is used by only the driver + code.
You can use torch.cuda.memory_cached doc to know how much memory pytorch hold onto for Tensor allocation.

hadaev8 · February 9, 2020, 1:25pm

Just checked in 1.4, problem still here.
After same code run, old version have 8gb allocated, and new 11gb on same gpu type.

albanD · February 9, 2020, 7:59pm

When you mean allocated. It is the one given by torch.cuda.memory_allocated() ?

hadaev8 · February 9, 2020, 8:11pm

Yes, you cant check notebooks from top post.
I added this command to it.

albanD · February 9, 2020, 8:14pm

It is reserved, not allocated. The reserved memory can increase due to many reasons (more code in the binary, caching allocator behavior, different GPU, different cudnn, etc). The allocated one corresponds to only Tensors (and this one should be constant).

hadaev8 · February 9, 2020, 8:47pm

Okay, allocated is constant, but what should i do with it?
I cant go to 1.3+ version, because i will hit out of memory error.
It is not about GPU type, got same on v100.

ptrblck · February 9, 2020, 8:50pm

Are you running into an out of memory error or are you concerned about the reserved memory?

hadaev8 · February 9, 2020, 8:51pm

Im not concern, still if I launch same code on 1.3, i will get OOM quickly, with smaller batch size it is fine.

ptrblck · February 9, 2020, 8:52pm

I might have misunderstood the previous post, but I thought the allocated memory is the same?

hadaev8 · February 9, 2020, 9:10pm

From notebook allocated is same.
I should not get OOM if allocated is same?

albanD · February 9, 2020, 11:19pm

The fact that the allocated memory is the same means that we use exactely the same amount of memory for Tensors. So that is good.
What is not tracked in this is:

cuda driver internal memory
cudnn buffers

To check the first, you can see how much cuda memory is used when you only do:

import torch
a = torch.rand(1).cuda()

To check the second, it is a bit harder. You can try to set torch.backends.deterministic = True. This forces to use the default deterministic algorithm that is usually not too memory hungry.

hadaev8 · February 10, 2020, 8:30am

With only one tensor both versions have 2gb memory reserved.
With deterministic still 7gb vs 11gb.

albanD · February 10, 2020, 3:52pm

So I guess for this model, the peak memory usage increased between 1.2 and later versions.
I am not familiar with these models. Do you have a short description of its structure and the modules used? Or could you write a small nn.Module that looks similar (and reproduces the memory increase)?