+ 50% VRAM used on torch 1.3 compared to 1.2

1.3
https://colab.research.google.com/drive/104LtQ1zIioIOMQEPgVve77m5Rd4Gm0wU
1.2
https://colab.research.google.com/drive/1y4LF1a90PYfKvFgQw6fCMYKtIFK07mVr
Everything same except torch version.
At the end nvidia-smi show 11gb and 7 gb memory used.

Why are there all these tensorflow warnings? Do you use it as well?
There are many other files imported. Are you sure that you don’t have a difference in model?

Tensorflow warnings from hparams contrib object.
Codebase exact same, only difference is pytorch version.

Could you try reducing the code to a small 30/40 lines of code? It is very hard to give an answer here as we don’t even see all the files :confused:

Okay.
This is copy past of tutorial.
https://colab.research.google.com/drive/10YhV2QqG-pZLXbsTkJRE-hVcXtmxzBXS
https://colab.research.google.com/drive/1IDTOlw3U3ZzCvRAirpISc7UBG8M88oZU
700mb vs 330mb

Rerunning both notebooks above from scratch give ~700MB usage. If you rerun your notebook above from scratch you still see this?

Looks like it is related to type of gpu, on k80 both versions take ~300mb, on p4 535 vs 595, on p100 715mb vs 777mb. So this example is invalid.
Still, top example was made on same gpu.
Maybe it is about rnn layers or zero padding, i will try to check later.

Hi,

So if you use different GPUs, I could think of few reasons:

  • The cuda driver behaves differently
  • Different architecture lead to different size of code loaded on the device by the driver
  • The algorithms chosen to do the compute by cudnn are different and use different amount of memory.

You can check how much memory is used when only doing torch.rand(1).cuda(). That will tell you how much memory is used by only the driver + code.
You can use torch.cuda.memory_cached doc to know how much memory pytorch hold onto for Tensor allocation.

Just checked in 1.4, problem still here.
After same code run, old version have 8gb allocated, and new 11gb on same gpu type.

When you mean allocated. It is the one given by torch.cuda.memory_allocated() ?

Yes, you cant check notebooks from top post.
I added this command to it.

It is reserved, not allocated. The reserved memory can increase due to many reasons (more code in the binary, caching allocator behavior, different GPU, different cudnn, etc). The allocated one corresponds to only Tensors (and this one should be constant).

Okay, allocated is constant, but what should i do with it?
I cant go to 1.3+ version, because i will hit out of memory error.
It is not about GPU type, got same on v100.

Are you running into an out of memory error or are you concerned about the reserved memory?

Im not concern, still if I launch same code on 1.3, i will get OOM quickly, with smaller batch size it is fine.

I might have misunderstood the previous post, but I thought the allocated memory is the same?

From notebook allocated is same.
I should not get OOM if allocated is same?

The fact that the allocated memory is the same means that we use exactely the same amount of memory for Tensors. So that is good.
What is not tracked in this is:

  • cuda driver internal memory
  • cudnn buffers

To check the first, you can see how much cuda memory is used when you only do:

import torch
a = torch.rand(1).cuda()

To check the second, it is a bit harder. You can try to set torch.backends.deterministic = True. This forces to use the default deterministic algorithm that is usually not too memory hungry.

With only one tensor both versions have 2gb memory reserved.
With deterministic still 7gb vs 11gb.

So I guess for this model, the peak memory usage increased between 1.2 and later versions.
I am not familiar with these models. Do you have a short description of its structure and the modules used? Or could you write a small nn.Module that looks similar (and reproduces the memory increase)?