Unable to allocate more than 7 out of 8 GB RAM on RTX 2070

ntp · January 24, 2019, 3:28pm

Hello,

newbie here in dire need of help.

The problem
I just got a brand new RTX 2070 8 GB and while it’s certainly fast, I don’t seem to be able to utilize its entire memory capacity. I’ve noticed this while running models but realized I need a more objective way to test it.

What I’ve tried
I made a tiny Jupyter notebook in which I create tensors of a given size on the GPU. This way I can precisely check how much memory I can use. To make the example below clearer, let me just mention that I also have a GTX 1060 6 GB and I’m running Windows 10. I’ve replicated the problem on both the nightly and stable versions of PyTorch 1.0 CUDA 10.

The approach was to find the maximum tensor size that fits on each card and see if I can fill the RAM memory this way. I’m printing some CUDA memory metrics but since I’m fairly (make that extremely) new to this I’m also monitoring GPU memory usage on both cards using GPU-Z. The 1060 is the primary card and usually has about 810 MB RAM occupied. The 2070 only has 4 MB RAM occupied.

import torch

def device_info(device):
    # Just for showing info
    device_name = torch.cuda.get_device_name(device)
    print(f"Info on {device_name}")
    print(f"CUDA capability {torch.cuda.get_device_capability(device)}")
    print(f"Maximum GPU memory usage by tensors on {device_name}: {torch.cuda.max_memory_allocated(device)/1e9} GB")
    print(f"Current GPU memory usage by tensors on {device_name}: {torch.cuda.memory_allocated(device)/1e9} GB")
    print(f"Maximum GPU cached memory usage on {device_name}: {torch.cuda.memory_cached(device)/1e9} GB")
    print(f"Current GPU cached memory usage on {device_name}: {torch.cuda.max_memory_cached(device)/1e9} GB\n\n")    

print(f"Using cuDNN version {torch.backends.cudnn.version()}")
# n_2070 = int(1.69605e9) # Does not work and produces amusing reason for not working
n_2070 = int(1.696e9) # Maximum size that works
n_1060 = int(1.275e9)
device_2070 = torch.device("cuda:0")
device_1060 = torch.device("cuda:1")
print(f"Estimated tensor size for 2070: {(n_2070 * 4 / 1e9):.3f} GB")
print(f"Estimated tensor size for 1060: {(n_1060 * 4 / 1e9):.3f} GB")

Output:
Using cuDNN version 7401
Estimated tensor size for 2070: 6.784 GB
Estimated tensor size for 1060: 5.100 GB

torch.empty((n_1060, 1), device=device_1060)
torch.empty((n_2070, 1), device=device_2070)
device_info(device_2070)
device_info(device_1060)

Output:
Info on GeForce RTX 2070
CUDA capability (7, 5)
Maximum GPU memory usage by tensors on GeForce RTX 2070: 6.784156672 GB
Current GPU memory usage by tensors on GeForce RTX 2070: 6.784024576 GB
Maximum GPU cached memory usage on GeForce RTX 2070: 6.785073152 GB
Current GPU cached memory usage on GeForce RTX 2070: 6.785073152 GB

Info on GeForce GTX 1060 6GB
CUDA capability (6, 1)
Maximum GPU memory usage by tensors on GeForce GTX 1060 6GB: 5.10001152 GB
Current GPU memory usage by tensors on GeForce GTX 1060 6GB: 0.0 GB
Maximum GPU cached memory usage on GeForce GTX 1060 6GB: 5.10001152 GB
Current GPU cached memory usage on GeForce GTX 1060 6GB: 5.10001152 GB

Further remarks
According to GPU-Z, the 2070 now has 6831 MB occupied and the 1060 has 5942 MB occupied. Therefore, the 1060 used all its memory while the 2070 has more than 1 GB left.

If I uncomment the first n_2070 = int(1.69605e9), I get the following mind-boggling error:
RuntimeError: CUDA out of memory. Tried to allocate 1024.00 KiB (GPU 0; 8.00 GiB total capacity; 6.32 GiB already allocated; 1.85 MiB free; 0 bytes cached)

A few questions here:

Why are there only 1.85 MB free when the capacity is 8 GB and only 6.32 GB have been allocated?
If there are 1.85 MB free, why can’t 1 MB be allocated?

Other things I tried

I want to emphasize that I tried my best to find an answer to this problem and couldn’t find any.
I tried removing the 1060 and only using the 2070 as a primary card. I’ve removed the drivers and performed a fresh install. I deleted the entire environment and created it again. No matter what I did, I could never get past 7 GB memory usage according to GPU-Z, while with the 1060 I could easily fill its memory.
As mentioned in the beginning of the post, I tried both stable and nightly versions of PyTorch 1.0 CUDA 10.

Please advise. Is the card at fault? Should I return it? Is there anything to do? I’m getting quite desperate with this situation.

Best,
Mircea

albanD · January 25, 2019, 10:54am

Hi,

The memory reported here is only the memory used by pytorch Tensors. The cuda driver will use some memory as well. And if you have other things running on the gpu (like a graphical interface) that will take some memory as well.
I think CUDA in some architectures allocates memory by blocks of 2MB. So if less than that is available, it’s possible that any allocation that needs a new block will fail.

What is the memory usage reported by nvidia-smi? Note that it’s possible that the peak memory usage is very fast and you don’t see it there.

ntp · January 25, 2019, 1:09pm

Thanks a lot for your reply.

I don’t use nvidia-smi. Maybe it’s more of a Linux thing? Anyway, I do monitor memory usage with GPU-Z which shows a plot over time. In GPU-Z I can clearly see the GTX 1060 6 GB reaching almost its full memory capacity (5.9 GB). The RTX 2070 8 GB never approaches its 8 GB capacity.

I’d say that the 2 MB allocation cannot be the culprit. The biggest question is how come there are only 1.85 MB free when in theory there should be 8 GB - 6.32 GB = 1.68 GB free? I understand memory fragmentation may be an issue sometimes, but it’s probably not the case here since I can fill up the 1060’s memory just fine.

What do you think of the code allocating tensors on the GPU? Playing around with the tensor sizes and monitoring memory usage with GPU-Z I can push the 1060 to use 5.9 / 6 GB but I can only get the 2070 to 6.9 / 8 GB. Why is that? Could the memory chips be faulty? This is my biggest problem and a huge disappointment for a brand new card. As I mentioned in the previous post, I tried switching the cards, removing the 1060, fresh install of drivers, conda environment, different versions of PyTorch, and nothing worked.

albanD · January 25, 2019, 3:14pm

Hi,

The 6.32GB usage is for Tensors only, not other elements so that does not tell you how much free memory there is.

I am not sure how much actual memory an RTX 2070 is suppose to have. @ngimel should know

ntp · January 25, 2019, 3:28pm

Sure, but I’m additionally checking memory usage with GPU-Z, which shows the total amount of RAM occupied. And there I can see the 1060 filling up its memory to the brink, while the 2070 (which has 8 GB of RAM) doesn’t, and can only go up to about 6.8 GB (so about 1.2 GB that I cannot use).

albanD · January 25, 2019, 3:39pm

Hi,

Unfortunately I don’t have a 2070 locally to try that

ntp · January 25, 2019, 3:41pm

Thanks for replying though

Yunus_Kayalidere · July 5, 2019, 12:47am

@ntp Hey,

I am facing the same issue here. I have a new laptop with rtx2070 (not dowgraded maxQ )version I alsa see almost the same vram usage. I tried using the allow_growth option on tensorflow and more often than any of my other cards I see lots of allocation failure warnings on the console. My cards vram is not being used. It even has optimus so it should not even be touched by any other programs normally.

Where are you on the problem? Did you solve it?

ntp · July 6, 2019, 10:12am

Hey @Yunus_Kayalidere,

I haven’t really fixed it but in the months since I’ve originally posted this I occasionally saw PyTorch filling up all 8 GB of RAM, so I imagine it’s just the way it works.