Dear community,
I replaced the CUDA memory allocation from cudaMalloc
to cudaMallocManaged
in c10/cuda/CUDACachingAllocator.cpp in the PyTorch open-source code, and successfully compiled it. It can be used as expected, and PyTorch version is v1.13.0 .
When training a GNN, I successfully oversubscribed the GPU memory.
When I use nsys
to analyze a Python program,
nsys profile --stats=true --cuda-um-gpu-page-faults=true --cuda-um-cpu-page-faults=true --show-output=true python reddit.py
there are no occurrences of any page faults.
Analysis results are as follows:
...
[ 9/11] Executing 'um_sum' stats report
SKIPPED: /mypath/report90.sqlite does not contain CUDA Unified Memory CPU page faults data.
[10/11] Executing 'um_total_sum' stats report
SKIPPED: /mypath/report90.sqlite does not contain CUDA Unified Memory CPU page faults data.
[11/11] Executing 'um_cpu_page_faults_sum' stats report
SKIPPED: /mypath/report90.sqlite does not contain CUDA Unified Memory CPU page faults data.
Why is that?
Thanks.