Program stops randomly with segmentation fault


Thread 0x00007fc9bc352700 (most recent call first):
<no Python frame>

Thread 0x00007fc99e9f5700 (most recent call first):
  File "/usr/lib/python3.8/threading.py", line 306 in wait
  File "/usr/lib/python3.8/threading.py", line 558 in wait
  File "/mnt/code/venv/lib/python3.8/site-packages/tqdm/_monitor.py", line 60 in run
  File "/usr/lib/python3.8/threading.py", line 932 in _bootstrap_inner
  File "/usr/lib/python3.8/threading.py", line 890 in _bootstrap

Current thread 0x00007fca571c8740 (most recent call first):
  File "/mnt/code/venv/lib/python3.8/site-packages/torch/utils/data/dataset.py", line 196 in <genexpr>
  File "/mnt/code/venv/lib/python3.8/site-packages/torch/utils/data/dataset.py", line 196 in __getitem__
  File "/mnt/code/venv/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 51 in <listcomp>
  File "/mnt/code/venv/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 51 in fetch
  File "/mnt/code/venv/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 677 in _next_data
  File "/mnt/code/venv/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 633 in __next__
  File "src/train.py", line 262 in <module>
Segmentation fault (core dumped)

I am getting this segmentation fault randomly once the training begins. This particular error was at 3223 epoch. The line number 262 in my code is as follows.

for x, y in data_loader:

where data_loader = DataLoader(dataset, mini_batch_size). I have also used tqdm for displaying the training status and that also is shown as part of the fault. Can anyone please help me to fix this code?

@ptrblck can you please have a look?

It’s hard to tell what might be causing the segfault, but you could try to execute your code via:

gdb --args python script.py
...
run
...
bt

to get the backtrace and post it here.

Hello @ptrblck,

Thank you for your reply. My code is based on python3 and I tried to run the following command

gdb --args python3 src/code.py

Pardon me if something is wrong, I have no experience using gdb. However, I get the following error

Reading symbols from python3...
(No debugging symbols found in python3)

Can you please let me know what is wrong?

Did you execute the run command after launching gdb? If so, did the code fail again? What did bt print?

Thank you for your reply. With gdb the program ran with no issues. I’ll keep running with gdb enabled. If the program stops with segmentation fault, I’ll get back to you.