NSight Compute CUPTI Error

Hi! When running NSight Compute on a remote cluster with the command

run_script.sh profile_test.py

I get the error:

==PROF== Connected to process 1700339 (<path_to_conda_env>/bin/python3.10)
WARNING:2023-11-29 19:41:07 1700339:1700339 init.cpp:155] function cbapi->getCuptiStatus() failed with error CUPTI_ERROR_MULTIPLE_SUBSCRIBERS_NOT_SUPPORTED (39)
WARNING:2023-11-29 19:41:07 1700339:1700339 init.cpp:156] CUPTI initialization failed - CUDA profiler activities will be missing
INFO:2023-11-29 19:41:07 1700339:1700339 init.cpp:158] If you see CUPTI_ERROR_INSUFFICIENT_PRIVILEGES, refer to NVIDIA Development Tools Solutions - | NVIDIA Developer
==PROF== Disconnected from process 1700339
==WARNING== No kernels were profiled.

run_script.sh loads some modules, then calls NCU (I removed --target-processes all in case that was causing the issue)


module load python/3.10.12
module load gcc/9.5.0
module load cuda/12.0.1
module load cudnn/


mamba activate $HOME_DIR/env

module load cmake

ncu --set full --export $HOME_DIR/ncu_data/profile_results $HOME_DIR/mamba_env/bin/python3 $1

and all profile_test.py does is import torch, with some nvtx markers:

import torch
from torch.cuda import nvtx

print(‘Hello World’)

Do you know why this might be? According to NSight Compute documentation,

  • Added error code CUPTI_ERROR_MULTIPLE_SUBSCRIBERS_NOT_SUPPORTED to indicate the presense of another CUPTI subscriber. API cuptiSubscribe() returns the new error code than CUPTI_ERROR_MAX_LIMIT_REACHED,

though I’m not sure what might be causing the error on my end. I’m running the commands on an isolated node with V100s.

If I run the python program that simply calls “import os” or “import numpy as np,” I get a no kernels profiled warning—not the CUPTI error. Therefore, I’m wondering if this has something to do with torch also using CUPTI. Notably, I built torch from source; in the build logs, I found “Using Kineto with CUPTI support.”

At the end of the day, I would like to be able to profile PyTorch models using NSight Compute.


The issue was most likely introduced in this PR and should be reverted. I’ll ping the code owners again as we are hitting the same issue internally and already bisected it to this PR.

CC @malfet as we just discussed it.

A revert landed in PR116201. I didn’t see any comments in the regressing PR, so I thought a workaround was being developed at NVIDIA. Sorry about this issue.