I’m trying to correlate forward/backward operations with the autograd profiler. I’m using nsight systems now that nvprof is EOL but don’t think it’s related to that.
According to the current docs for
torch.autograd.profiler.emit_nvtx you can correlate them based on a
seq=<seq_nr> in forward pass operations and a
stashed seq=<seq_nr> in the corresponding backward operations. However in the output in 1.3 I don’t find any
stashed seq numbers, just
seq numbers. It looks like subsequent refactoring of
torch/csrc/autograd/profiler.cpp may have altered this as I can find the
stashed seq in earlier code but not in current code.
As I understand it from the docs the
stashed seq= is useful to distinguish operations in the backwards which correlate to forward operations from operations in the backward that don’t correlate and will just have a
seq= in backward.
Don’t have easily runnable code to post (but can put something together if that will help). Running with the code from the
emit_nvtx docs of:
with torch.cuda.profiler.profile(): with torch.autograd.profiler.emit_nvtx(): ...
Then launching with:
nsys profile -t cublas,cuda,cudnn,nvtx -c cudaProfilerApi --kill=none $(which python) my_script.py
Using NVIDIA_Nsight_Systems_Linux_2019.5.1.58 (nvtx option only added in recent versions). As noted I get the
seq= nvtx ranges, just not the
stashed seq= so seems like NVTX ranges are working.