Does not contain CUDA Unified Memory CPU page faults data

Dear community,
I replaced the CUDA memory allocation from cudaMalloc to cudaMallocManaged in c10/cuda/CUDACachingAllocator.cpp in the PyTorch open-source code, and successfully compiled it. It can be used as expected, and PyTorch version is v1.13.0 .

When training a GNN, I successfully oversubscribed the GPU memory.

When I use nsys to analyze a Python program,
nsys profile --stats=true --cuda-um-gpu-page-faults=true --cuda-um-cpu-page-faults=true --show-output=true python reddit.py
there are no occurrences of any page faults.

Analysis results are as follows:

...
[ 9/11] Executing 'um_sum' stats report
SKIPPED: /mypath/report90.sqlite does not contain CUDA Unified Memory CPU page faults data.
[10/11] Executing 'um_total_sum' stats report
SKIPPED: /mypath/report90.sqlite does not contain CUDA Unified Memory CPU page faults data.
[11/11] Executing 'um_cpu_page_faults_sum' stats report
SKIPPED: /mypath/report90.sqlite does not contain CUDA Unified Memory CPU page faults data.

Why is that?

Thanks.

How did you edit the CUDACachingAllocator.cpp? Did you just replace cudaMalloc() with cudaMallocManaged() or did you replace cudaMalloc_count{}, cudaMallocMaybeCapturing(), release_lock_on_cudamalloc() and cudaMallocAsync() as well?