Hi! When running NSight Compute on a remote cluster with the command
run_script.sh profile_test.py
I get the error:
==PROF== Connected to process 1700339 (<path_to_conda_env>/bin/python3.10)
WARNING:2023-11-29 19:41:07 1700339:1700339 init.cpp:155] function cbapi->getCuptiStatus() failed with error CUPTI_ERROR_MULTIPLE_SUBSCRIBERS_NOT_SUPPORTED (39)
WARNING:2023-11-29 19:41:07 1700339:1700339 init.cpp:156] CUPTI initialization failed - CUDA profiler activities will be missing
INFO:2023-11-29 19:41:07 1700339:1700339 init.cpp:158] If you see CUPTI_ERROR_INSUFFICIENT_PRIVILEGES, refer to NVIDIA Development Tools Solutions - | NVIDIA Developer
==PROF== Disconnected from process 1700339
==WARNING== No kernels were profiled.
run_script.sh loads some modules, then calls NCU (I removed --target-processes all in case that was causing the issue)
#!/bin/bash
module load python/3.10.12
module load gcc/9.5.0
module load cuda/12.0.1
module load cudnn/8.9.2.26_cuda12
HOME_DIR=<path_to_my_home_dir>
mamba activate $HOME_DIR/env
module load cmake
ncu --set full --export $HOME_DIR/ncu_data/profile_results $HOME_DIR/mamba_env/bin/python3 $1
and all profile_test.py does is import torch, with some nvtx markers:
import torch
from torch.cuda import nvtx
nvtx.range_push(‘start’)
print(‘Hello World’)
nvtx.range_pop()
Do you know why this might be? According to NSight Compute documentation,
- Added error code CUPTI_ERROR_MULTIPLE_SUBSCRIBERS_NOT_SUPPORTED to indicate the presense of another CUPTI subscriber. API cuptiSubscribe() returns the new error code than CUPTI_ERROR_MAX_LIMIT_REACHED,
though I’m not sure what might be causing the error on my end. I’m running the commands on an isolated node with V100s.
If I run the python program that simply calls “import os” or “import numpy as np,” I get a no kernels profiled warning—not the CUPTI error. Therefore, I’m wondering if this has something to do with torch also using CUPTI. Notably, I built torch from source; in the build logs, I found “Using Kineto with CUPTI support.”
At the end of the day, I would like to be able to profile PyTorch models using NSight Compute.
Thanks!