Not sure how best to categorise this topic, although nothing stands out as an obvious topic for this questiont. Feel free to clarify.
I am currently trying to run a simple pooling layer through PyTorch 1.6.0 as follows:
# run-torch-pooling.py
import torch
import numpy as np
# Get input tensor
t = torch.from_numpy(np.array([i for i in range(224*224*3)], dtype=np.float32).reshape((1, 3, 224, 224)))
pool = torch.nn.MaxPool2d(3, stride=2)
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
if torch.cuda.is_available():
t = t.to(device)
with torch.cuda.device(0):
output = pool(t)
# Do stuff with output
Simple program, and this works fine. My issue is that if I were to run the following command (assuming an appropriate environment in Ubuntu:
$ (pyenv) CUDNN_LOGINFO_DBG=1 CUDNN_LOGDEST_DBG=stdout python run-torch-pooling.py
Nothing happens regarding the expected cuDNN trace. Am I right in thinking that this is because cuDNN is opened in a separate process and so I can’t introspect the raw cuDNN dump? For what it’s worth, the analogous program for Tensorflow works as expected and dumps the cuDNN kernel calls.
I can verify that my GPU is being used, as I can observe memory allocations being done in nvidia-smi
.
If anyone could give advice or answers to how I can activate this kind of trace that would be greatly appreciated.