Bug in Pytorch GPU memory handling?

FaultyBagnose · October 7, 2022, 9:58am

I’m running a test modelled from this Stack Exchange post: https://stackoverflow.com/questions/57496285/why-is-the-memory-in-gpu-still-in-use-after-clearing-the-object


import torch
import GPUtil

def memory_test(device):
    x = torch.rand(10000,300,200).cuda(device)

memory_test(1)
GPUtil.showUtilization()
torch.cuda.empty_cache()
GPUtil.showUtilization()

Output:

| ID | GPU  | MEM |
-------------------
|  0 |   0% |  0% |
|  1 |   0% | 26% |
| ID | GPU  | MEM |
-------------------
|  0 |   0% |  6% |
|  1 |  43% |  6% |
| ID | GPU  | MEM |

When torch.cuda.empty_cache() cleans up, it is well-known that it keeps a small piece (6%) for cache overhead. But it seems to allocate this overhead also for GPU 0 (which has not been in use). Is this a bug?

FaultyBagnose

FaultyBagnose · January 2, 2023, 10:25am

Has anyone tried to reproduce this potential bug?

ptrblck · January 2, 2023, 9:44pm

Yes, and I cannot reproduce it so maybe your PyTorch version is old and this issue might have already been fixed:

| ID | GPU | MEM |
------------------
|  0 |  0% |  0% |
|  1 | 38% |  7% |
|  2 |  0% |  0% |
|  3 |  0% |  0% |
|  4 |  0% |  0% |
|  5 |  0% |  0% |
|  6 |  0% |  0% |
|  7 |  0% |  0% |
| ID | GPU | MEM |
------------------
|  0 |  0% |  0% |
|  1 |  5% |  1% |
|  2 |  0% |  0% |
|  3 |  0% |  0% |
|  4 |  0% |  0% |
|  5 |  0% |  0% |
|  6 |  0% |  0% |
|  7 |  0% |  0% |