Different memory allocation on GTX 1080 ti, Tesla k80, Tesla v100 for the same pytorch model

cibin_john_joseph · October 23, 2020, 3:45am

I have tried loading a distilbert model in pytorch over 3 different GPUs (GeForce GTX 1080 ti, tesla k80, tesla v100). According to the pytorch cuda profiler, the memory consumption is identical in all of these GPUs(534MB). But “nvidia-smi” shows different memory consumption for each of them (GTX 1080 ti- 1181MB, tesla k80 - 898MB, tesla v100- 1714MB).

I chose v100, hoping to accommodate more processes because of it’s extra memory. Because of this, I am not able accommodate any more processes in v100 compared to k80.

Versions: Python 3.6.11, transformers==2.3.0,
torch==1.6.0

Any help would be appreciated.

Following are the memory consumption in the GPUs.

----------------GTX 1080ti---------------------

2020-10-19 02:11:04,147 - CE - INFO - torch.cuda.max_memory_allocated() : 514.33154296875
2020-10-19 02:11:04,147 - CE - INFO - torch.cuda.memory_allocated() : 514.33154296875
2020-10-19 02:11:04,147 - CE - INFO - torch.cuda.memory_reserved() : 534.0
2020-10-19 02:11:04,148 - CE - INFO - torch.cuda.max_memory_reserved() : 534.0

The output of “nvidia-smi” :

2020-10-19 02:11:04,221 - CE - INFO - | ID | Name                | Serial          | UUID                                     || GPU temp. | GPU util. | Memory util. || Memory total | Memory used | Memory free || Display mode | Display active |
2020-10-19 02:11:04,222 - CE - INFO - |  0 | GeForce GTX 1080 Ti | [Not Supported] | GPU-58d5d4d3-07a1-81b4-ba67-8d6b46e342fb ||       50C |       15% |          11% ||      11178MB |      1181MB |      9997MB || Disabled     | Disabled       |

----------------Tesla k80---------------------

2020-10-19 12:15:37,030 - CE - INFO - torch.cuda.max_memory_allocated() : 514.33154296875
2020-10-19 12:15:37,031 - CE - INFO - torch.cuda.memory_allocated() : 514.33154296875
2020-10-19 12:15:37,031 - CE - INFO - torch.cuda.memory_reserved() : 534.0
2020-10-19 12:15:37,031 - CE - INFO - torch.cuda.max_memory_reserved() : 534.0

The output of “nvidia-smi” :

2020-10-19 12:15:37,081 - CE - INFO - | ID | Name      | Serial        | UUID                                     || GPU temp. | GPU util. | Memory util. || Memory total | Memory used | Memory free || Display mode | Display active |
2020-10-19 12:15:37,081 - CE - INFO - |  0 | Tesla K80 | 0324516191902 | GPU-1e7baee8-174b-2178-7115-cf4a063a8923 ||       50C |        3% |           8% ||      11441MB |       898MB |     10543MB || Disabled     | Disabled       |

----------------Tesla v100---------------------

2020-10-20 08:18:42,952 - CE - INFO - torch.cuda.max_memory_allocated() : 514.33154296875
2020-10-20 08:18:42,952 - CE - INFO - torch.cuda.memory_allocated() : 514.33154296875
2020-10-20 08:18:42,953 - CE - INFO - torch.cuda.memory_reserved() : 534.0
2020-10-20 08:18:42,953 - CE - INFO - torch.cuda.max_memory_reserved() : 534.0

The output of “nvidia-smi” :

2020-10-20 08:18:43,020 - CE - INFO - | ID | Name                 | Serial        | UUID                                     || GPU temp. | GPU util. | Memory util. || Memory total | Memory used | Memory free || Display mode | Display active |
2020-10-20 08:18:43,020 - CE - INFO - |  0 | Tesla V100-SXM2-16GB | 0323617004258 | GPU-849088a3-508a-1737-7611-75a087f18085 ||       29C |        0% |          11% ||      16160MB |      1714MB |     14446MB || Enabled      | Disabled       |

ptrblck · October 23, 2020, 8:57am

Different GPU architectures (in combination with different CUDA versions) might use different memory footprints for the CUDA context.
What is the unit of the torch.cuda.memory... output?
1.7GB are too much for the context alone on a V100 and it should be around ~800-1000MB.

cibin_john_joseph · October 23, 2020, 9:40am

The unit of torch.cuda.memory… output is Megabytes. (converted from bytes)

cibin_john_joseph · October 23, 2020, 11:54am

The CUDA Version is 10.2.89 for all the three GPUs.

ptrblck · October 23, 2020, 12:04pm

Thanks for the update.
I cannot reproduce it with CUDA10.2.89, PyTorch 1.6, NVIDIA driver 450.51.06 and the creation of a CUDATensor creates a CUDA context of ~940MB on a V100-SXM2 32GB using the conda binaries.
Building from source using CUDA11.1 and a newer PyTorch version creates a context of ~816MB.

cibin_john_joseph · October 28, 2020, 5:53am

Thanks for the response.

The version of NVIDIA driver that I use is 440.95.01. The versions of the python libraries in my virtual environment are:

boto3==1.16.6
botocore==1.19.6
certifi==2020.6.20
chardet==3.0.4
click==7.1.2
future==0.18.2
GPUtil==1.4.0
idna==2.10
jmespath==0.10.0
joblib==0.17.0
numpy==1.19.2
pkg-resources==0.0.0
python-dateutil==2.8.1
regex==2020.10.23
requests==2.24.0
s3transfer==0.3.3
sacremoses==0.0.43
sentencepiece==0.1.92
six==1.15.0
torch==1.6.0
tqdm==4.51.0
transformers==2.3.0
urllib3==1.25.11

Unfortunately, I cannot share the original code here. Sharing another code which could be used to recreate the same issue.

import os
import sys
import GPUtil
from transformers import DistilBertTokenizer, DistilBertModel
import torch
from io import StringIO


class Capturing(list):
    def __enter__(self):
        self._stdout = sys.stdout
        sys.stdout = self._stringio = StringIO()
        return self
    def __exit__(self, *args):
        self.extend(self._stringio.getvalue().splitlines())
        del self._stringio    # free up some memory
        sys.stdout = self._stdout

def get_torch_cuda_memory_info():
    with Capturing() as stdout_captured:
        GPUtil.showUtilization(all=True)
    for stdout in stdout_captured:
        print(stdout)

    print('\ntorch.cuda.max_memory_allocated() : ' + str(torch.cuda.max_memory_allocated()/ (1048576)))
    print('torch.cuda.memory_allocated() : ' + str(torch.cuda.memory_allocated() / (1048576)))
    print('torch.cuda.memory_reserved() : ' + str(torch.cuda.memory_reserved() / 1048576))
    print('torch.cuda.max_memory_reserved() : ' + str(torch.cuda.max_memory_reserved() / 1048576))

if __name__ == "__main__":

    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    print('device : ',device)

    model = 'distilbert-base-uncased'
    tokenizer = DistilBertTokenizer.from_pretrained(model)
    model = DistilBertModel.from_pretrained(model).to(device)

    get_torch_cuda_memory_info()

The output @v100 is:

To use data.metrics please install scikit-learn. See https://scikit-learn.org/stable/index.html
device :  cuda
| ID | Name                 | Serial        | UUID                                     || GPU temp. | GPU util. | Memory util. || Memory total | Memory used | Memory free || Display mode | Display active |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  0 | Tesla V100-SXM2-16GB | 1562919007687 | GPU-4ed687c6-bd14-8e65-d88d-a4e5b1638697 ||       35C |        5% |           9% ||      16160MB |      1452MB |     14708MB || Enabled      | Disabled       |

torch.cuda.max_memory_allocated() : 254.234375
torch.cuda.memory_allocated() : 254.234375
torch.cuda.memory_reserved() : 272.0
torch.cuda.max_memory_reserved() : 272.0

The output @k80 is:

To use data.metrics please install scikit-learn. See https://scikit-learn.org/stable/index.html
device :  cuda
| ID | Name      | Serial        | UUID                                     || GPU temp. | GPU util. | Memory util. || Memory total | Memory used | Memory free || Display mode | Display active |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  0 | Tesla K80 | 0321216031679 | GPU-a7886e31-16b5-8601-d40f-3c04a9b8501f ||       39C |       21% |           6% ||      11441MB |       635MB |     10806MB || Disabled     | Disabled       |

torch.cuda.max_memory_allocated() : 254.234375
torch.cuda.memory_allocated() : 254.234375
torch.cuda.memory_reserved() : 272.0
torch.cuda.max_memory_reserved() : 272.0

ptrblck · October 28, 2020, 8:00am

Using 440.33.01 on V100 16GB GPUs I get these results:

PyTorch 1.7.0 + CUDA10.2 binaries (built for 'sm_37', 'sm_50', 'sm_60', 'sm_61', 'sm_70', 'sm_75', 'compute_37'):

| ID | Name                 | [...]  | [...] || GPU temp. | GPU util. | Memory util. || Memory total | Memory used | Memory free || Display mode | Display active |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  0 | Tesla V100-DGXS-16GB | [...] | [...] ||       36C |       13% |           8% ||      16158MB |      1331MB |     14827MB || Disabled     | Disabled       |
[...]
torch.cuda.max_memory_allocated() : 254.234375
torch.cuda.memory_allocated() : 254.234375
torch.cuda.memory_reserved() : 272.0
torch.cuda.max_memory_reserved() : 272.0

PyTorch master + CUDA10.2 (built for 'sm_70', 'sm_75', 'compute_75'):

| ID | Name                 | [..] | [...] || GPU temp. | GPU util. | Memory util. || Memory total | Memory used | Memory free || Display mode | Display active |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  0 | Tesla V100-DGXS-16GB | [...] | [...] ||       37C |        9% |           7% ||      16158MB |      1083MB |     15075MB || Disabled     | Disabled       |
[...]
torch.cuda.max_memory_allocated() : 254.234375
torch.cuda.memory_allocated() : 254.234375
torch.cuda.memory_reserved() : 272.0
torch.cuda.max_memory_reserved() : 272.0