How to use pdb or gdb debug from python into C/C++ code?

@albanD @apaszke

I managed to use pdb to explore python source code of pytorch, but I want to explore lower level code written in C/C++.

for example, to explore F.conv2d, with pdb I can locate

  50  ->     f = ConvNd(_pair(stride), _pair(padding), _pair(dilation), False,
  51                    _pair(0), groups, torch.backends.cudnn.benchmark, torch.backends.cudnn.enabled)

then I can locate ConvNd at ConvNd = torch._C._functions.ConvNd in L19, then pdb can’t help me go further to explore C/C++ code underneath pytorch, so with help from @albanD, I can see ConvNd at a cpp file L248 , but pdb can no longer help me explore functions and values of C/C++ codes.

I have managed to install gdb and am able to debug C code with it. However, I can’t make it work with pytorch code snippet. Below is the code snippet and my attempts of gdb to explore pytorch code from python level to C/C++ level.

import sys
import torch # numpy
from torch.autograd import Variable # tensor with gradient
import torch.nn as nn # all layers classes
import torch.nn.functional as F # other functions for building model


class Net(nn.Module): # nn.Module里面到底有什么?

  def __init__(self):
    super(Net, self).__init__() # Module init 到底做了什么?
		
    self.conv1 = nn.Conv2d(1, 6, 5)
    self.conv2 = nn.Conv2d(6, 16, 5)
    # an affine operation: y = Wx + b
    self.fc1 = nn.Linear(16 * 5 * 5, 120) # nn.Linear 在构建什么?
    self.fc2 = nn.Linear(120, 84)
    self.fc3 = nn.Linear(84, 10)

  def forward(self, x):

    x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2)) # 看看F.relu的代码逻辑
    x = F.max_pool2d(F.relu(self.conv2(x)), 2) # 看看F.max_pool2d代码逻辑
    x = x.view(-1, self.num_flat_features(x)) # 在Net.forward里面调用Net.num_flat_features
    x = F.relu(self.fc1(x))# 如何使用Net.forward 和 Net.num_flat_features
    x = F.relu(self.fc2(x))
    x = self.fc3(x)
    return x

  def num_flat_features(self, x): #
    size = x.size()[1:]
    num_features = 1
    for s in size:
        num_features *= s
    return num_features

net = Net()

single_sample = torch.randn(32,32).unsqueeze(0).unsqueeze(0)
inputs = Variable(torch.randn(1, 1, 32, 32))
output = net(inputs)

conv1_weight_1 = list(net.parameters())[0][0]
target = Variable(torch.arange(1, 11))

criterion = nn.MSELoss()
loss = criterion(output, target)

net.zero_grad()     # zeroes the gradient buffers of all parameters
net.conv1.weight.grad

import torch.optim as optim
optimizer = optim.SGD(net.parameters(), lr=0.01)
optimizer.zero_grad() # optimizer.zero_grad() == net.zero_grad()


loss.backward() # 参数的导数,从无到有
optimizer.step() # 只是更新参数,不输出任何值
net.conv1.bias # 展示某层的参数
net.conv1.weight.grad # 展示某层的导数
net.conv1.zero_grad() # 将某层的导数归0

my attempt 1:

Focus on one: /Users/Natsume/Documents/shendusuipian/pytorch/raw_pytorch/60min_intro
 ->gdb python3 03neural_networks_tutorial.py
GNU gdb (GDB) 8.0.1
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-apple-darwin16.7.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python3...(no debugging symbols found)...done.
Illegal process-id: 03neural_networks_tutorial.py.
"/Users/Natsume/Documents/shendusuipian/pytorch/raw_pytorch/60min_intro/03neural_networks_tutorial.py" is not a core dump: File format not recognized

my attempt 2:

Focus on one: /Users/Natsume/Documents/shendusuipian/pytorch/raw_pytorch/60min_intro
 ->gdb --args python3 03neural_networks_tutorial.py
GNU gdb (GDB) 8.0.1
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-apple-darwin16.7.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python3...(no debugging symbols found)...done.
(gdb) start
Function "main" not defined.
Starting program: /usr/local/bin/python3 03neural_networks_tutorial.py
[New Thread 0x1403 of process 2031]
warning: unhandled dyld version (15)
[New Thread 0x1503 of process 2031]

Thread 3 received signal SIGTRAP, Trace/breakpoint trap.
[Switching to Thread 0x1503 of process 2031]
0x0000000100005000 in ?? ()
(gdb) next
Cannot find bounds of current function
(gdb)

Is it possible to use pdb or gdb to explore pytorch code from python level all the way down to C/C++ level?

If so, how could I do it with the code snippet provided above?

if not, how would you explore pytorch code from python level all the way down to C/C++ level? Could you give me a guided map? ( I have started to learn C/C++ already)

Thank you very much!

Here are some pointers https://wiki.python.org/moin/DebuggingWithGdb
http://www.scipy-lectures.org/advanced/debugging/#debugging-segmentation-faults-using-gdb

Generally, you will want to insert a breakpoint in gdb with the C++ side function name or file + line number. And then you can step through line by line with the keys n or s

1 Like

Thanks a lot @smth

The links are very helpful. I replicate the demo provided in the scipy-lectures, but failed.

Focus on one: /Users/Natsume/Desktop
 ->gdb python3
GNU gdb (GDB) 8.0.1
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-apple-darwin16.7.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python3...(no debugging symbols found)...done.
(gdb) run segfault.py
Starting program: /usr/local/bin/python3 segfault.py
[New Thread 0x1403 of process 2470]
warning: unhandled dyld version (15)
[New Thread 0x1503 of process 2470]

Thread 3 received signal SIGTRAP, Trace/breakpoint trap.
[Switching to Thread 0x1503 of process 2470]
0x0000000100005000 in ?? ()
(gdb) bt
#0  0x0000000100005000 in ?? ()
#1  0x0000000100000000 in ?? ()
#2  0x0000000000000002 in ?? ()
#3  0x00007fff5fbff988 in ?? ()
#4  0x00007fff5fbff9fc in ?? ()
#5  0x0000000000000000 in ?? ()
(gdb)

Do you what is wrong with it? is it my gdb is not working properly or I didn’t actually install gdb correctly? you can also check my 2 attempts above in my question.

Reading symbols from python3...(no debugging symbols found)...done.

is this supposed to happen?

Could you help me confirm whether it is the problem of my gdb?

Thanks a lot!

The problem is with python, you have a release version of it (so without function names), you need to get a debug version of python either from a package if you have one for your machine or by recompiling it yourself with the --with_pydebug flag.

1 Like

Thanks a lot!

I have tried to compile python from source. However, the problem remains.

Is it because I still didn’t compile python successfully to work with gdb? Here is the make test result

----------------------------------------------------------------------
Ran 73 tests in 1.854s

FAILED (failures=5)
test test_urllib2 failed
3 tests failed again:
    test_gdb test_urllib2 test_venv

Total duration: 13 min 22 sec
Tests result: FAILURE
make: *** [test] Error 1
Focus on one: /Users/Natsume/Downloads/Python-3.6.2/debug

I tried to compile it twice, but the same result like above.
What else should I try? thanks!

I’m posting this in case someone finds it helpful.

If you are debugging a module built with torch.utils.ffi (or cffi), enable debugging symbols (-g) and set the -O0 flag, as it will override the default -O3 from distutils.unixccompiler.py. In my case, it was something like:

CC=g++ CFLAGS="-O0 -g" python build.py

Then, execute gdb python. At the gdb prompt, run your script:

(gdb) run test_module.py

I’m not using python-dbg, so I interrupt the execution with Ctrl+c, then I set a breakpoint at the C++ function I want to debug:

(gdb) b my_conv2d

Then, hit c to continue.

2 Likes

Tested and confirmed that @victorhcm’s approach works.

Here is what I’ve done in conda environment

Setup testing environment

conda create --name onnx_p36 python=3.6
source activate onnx_p36

Install Pytorch

git clone https://github.com/pytorch/pytorch
cd pytorch
NO_CUDA=1 DEBUG=1 python setup.py build develop

Enter into Python shell and do “import torch” to make sure torch is successfully installed

Now you should be able to do debugging via “gdb python”

4 Likes

I think you are confusing develop version python with python with pydebug support. If you build python from source and don’t add any other option, the result python executable, which I meant by develop version python, contains debugging symbols. pydebug is not required even though it’s useful.

To add to victorhcm’s answer, I came up with a little more graceful approach. If you just add this one liner import os, signal; os.kill(os.getpid(), signal.SIGTRAP), you can stop at the exact point of your interest. It works better than pressing Ctrl+c button with.

# a.py

def breakpoint():
    import os, signal
    os.kill(os.getpid(), signal.SIGTRAP)

# your code...
breakpoint()  # set a breakpoint
# your code...

Then gdb python and do the same thing.

2 Likes