Acquire cuDNN debug trace for PyTorch

Not sure how best to categorise this topic, although nothing stands out as an obvious topic for this questiont. Feel free to clarify.

I am currently trying to run a simple pooling layer through PyTorch 1.6.0 as follows:

# run-torch-pooling.py
import torch
import numpy as np

# Get input tensor 
t = torch.from_numpy(np.array([i for i in range(224*224*3)], dtype=np.float32).reshape((1, 3, 224, 224)))

pool = torch.nn.MaxPool2d(3, stride=2)

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

if torch.cuda.is_available():
  t = t.to(device)

with torch.cuda.device(0):
  output = pool(t)

# Do stuff with output

Simple program, and this works fine. My issue is that if I were to run the following command (assuming an appropriate environment in Ubuntu:

$ (pyenv) CUDNN_LOGINFO_DBG=1 CUDNN_LOGDEST_DBG=stdout python run-torch-pooling.py

Nothing happens regarding the expected cuDNN trace. Am I right in thinking that this is because cuDNN is opened in a separate process and so I can’t introspect the raw cuDNN dump? For what it’s worth, the analogous program for Tensorflow works as expected and dumps the cuDNN kernel calls.

I can verify that my GPU is being used, as I can observe memory allocations being done in nvidia-smi.

If anyone could give advice or answers to how I can activate this kind of trace that would be greatly appreciated.

I haven’t looked into the source code, but maybe this particular workload is not using cudnn at all.
Have you checked the kernel calls with nvprof of nsys to make sure cudnn is used?