Segmentation fault and there are no infomation about this error

Weng_zhiqiang · December 11, 2019, 11:31am

Hi,

I have some issues which I am not able to solve. A segmentation fault happens when I run this project in brach with pytorch version 1.0. The project is below:

And here is my env:
Python 3.7
Pytorch 1.0
CUDA 9.0
gcc 4.8.5
Actually, the prompt often follows the code’s print(Loading pretrained weights from …). But after that no information about this error can be seen.

ptrblck · December 11, 2019, 3:44pm

This seems to be a known issue in the repository.

The repository also doesn’t seem to be maintained anymore.
You could try to get a stack trace via:

$ gdb --args python my_script.py
...
Reading symbols from python...done.
(gdb) run
...
(gdb) backtrace
...

and we could try to debug.

Weng_zhiqiang · December 15, 2019, 9:10am

Thx for ur reply. And I try to run this as ur advice. I’m not sure if this error comes from the wrong version of my gcc. I’ve tried to change the gcc version from 4.8 to 5.2. However, it still didn’t work. Maybe I have NEVER changed my gcc version because of my wrong opera?
(gdb) run
Starting program: /research/byu2/yxzhou/anaconda3/envs/faster/bin/python trainval_net.py --dataset pascal_voc --net vgg16 --bs 1 --nw 8 --lr 0.0001 --lr_decay_step 10 --cuda
[Thread debugging using libthread_db enabled]
Using host libthread_db library “/lib64/libthread_db.so.1”.
Missing separate debuginfo for /research/byu2/yxzhou/anaconda3/envs/faster/lib/python3.7/site-packages/mkl/…/…/…/libiomp5.so
Try: yum --enablerepo=‘debug’ install /usr/lib/debug/.build-id/2f/6a88b5d1a44463ab69c8cd6e8461149a2e775a.debug
Detaching after fork from child process 178410.
warning: File “/research/byu2/yxzhou/anaconda3/envs/faster/lib/libstdc++.so.6.0.21-gdb.py” auto-loading has been declined by your auto-load safe-path' set to "$debugdir:$datadir/auto-load:/usr/bin/mono-gdb.py". To enable execution of this file add add-auto-load-safe-path /research/byu2/yxzhou/anaconda3/envs/faster/lib/libstdc++.so.6.0.21-gdb.py line to your configuration file "/uac/msc/yxzhou/.gdbinit". To completely disable this security protection add set auto-load safe-path / line to your configuration file "/uac/msc/yxzhou/.gdbinit". For more information about this security protection see the "Auto-loading safe path" section in the GDB manual. E.g., run from the shell: info "(gdb)Auto-loading safe path" Missing separate debuginfo for /research/byu2/yxzhou/anaconda3/envs/faster/lib/python3.7/site-packages/cv2/.libs/libz-a147dcb0.so.1.2.3 Missing separate debuginfo for /research/byu2/yxzhou/anaconda3/envs/faster/lib/python3.7/site-packages/cv2/.libs/libbz2-7225278b.so.1.0.3 Called with args: Namespace(batch_size=1, checkepoch=1, checkpoint=0, checkpoint_interval=10000, checksession=1, class_agnostic=False, cuda=True, dataset='pascal_voc', disp_interval=100, large_scale=False, lr=0.0001, lr_decay_gamma=0.1, lr_decay_step=10, mGPUs=False, max_epochs=20, net='vgg16', num_workers=8, optimizer='sgd', resume=False, save_dir='models', session=1, start_epoch=1, use_tfboard=False) /research/byu2/yxzhou/faster-rcnn.pytorch/lib/model/utils/config.py:374: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. yaml_cfg = edict(yaml.load(f)) Using config: {'ANCHOR_RATIOS': [0.5, 1, 2], 'ANCHOR_SCALES': [8, 16, 32], 'CROP_RESIZE_WITH_MAX_POOL': False, 'CUDA': False, 'DATA_DIR': '/research/byu2/yxzhou/faster-rcnn.pytorch/data', 'DEDUP_BOXES': 0.0625, 'EPS': 1e-14, 'EXP_DIR': 'vgg16', 'FEAT_STRIDE': [16], 'GPU_ID': 0, 'MATLAB': 'matlab', 'MAX_NUM_GT_BOXES': 20, 'MOBILENET': {'DEPTH_MULTIPLIER': 1.0, 'FIXED_LAYERS': 5, 'REGU_DEPTH': False, 'WEIGHT_DECAY': 4e-05}, 'PIXEL_MEANS': array([[[102.9801, 115.9465, 122.7717]]]), 'POOLING_MODE': 'align', 'POOLING_SIZE': 7, 'RESNET': {'FIXED_BLOCKS': 1, 'MAX_POOL': False}, 'RNG_SEED': 3, 'ROOT_DIR': '/research/byu2/yxzhou/faster-rcnn.pytorch', 'TEST': {'BBOX_REG': True, 'HAS_RPN': True, 'MAX_SIZE': 1000, 'MODE': 'nms', 'NMS': 0.3, 'PROPOSAL_METHOD': 'gt', 'RPN_MIN_SIZE': 16, 'RPN_NMS_THRESH': 0.7, 'RPN_POST_NMS_TOP_N': 300, 'RPN_PRE_NMS_TOP_N': 6000, 'RPN_TOP_N': 5000, 'SCALES': [600], 'SVM': False}, 'TRAIN': {'ASPECT_GROUPING': False, 'BATCH_SIZE': 256, 'BBOX_INSIDE_WEIGHTS': [1.0, 1.0, 1.0, 1.0], 'BBOX_NORMALIZE_MEANS': [0.0, 0.0, 0.0, 0.0], 'BBOX_NORMALIZE_STDS': [0.1, 0.1, 0.2, 0.2], 'BBOX_NORMALIZE_TARGETS': True, 'BBOX_NORMALIZE_TARGETS_PRECOMPUTED': True, 'BBOX_REG': True, 'BBOX_THRESH': 0.5, 'BG_THRESH_HI': 0.5, 'BG_THRESH_LO': 0.0, 'BIAS_DECAY': False, 'BN_TRAIN': False, 'DISPLAY': 10, 'DOUBLE_BIAS': True, 'FG_FRACTION': 0.25, 'FG_THRESH': 0.5, 'GAMMA': 0.1, 'HAS_RPN': True, 'IMS_PER_BATCH': 1, 'LEARNING_RATE': 0.01, 'MAX_SIZE': 1000, 'MOMENTUM': 0.9, 'PROPOSAL_METHOD': 'gt', 'RPN_BATCHSIZE': 256, 'RPN_BBOX_INSIDE_WEIGHTS': [1.0, 1.0, 1.0, 1.0], 'RPN_CLOBBER_POSITIVES': False, 'RPN_FG_FRACTION': 0.5, 'RPN_MIN_SIZE': 8, 'RPN_NEGATIVE_OVERLAP': 0.35, 'RPN_NMS_THRESH': 0.7, 'RPN_POSITIVE_OVERLAP': 0.7, 'RPN_POSITIVE_WEIGHT': -1.0, 'RPN_POST_NMS_TOP_N': 2000, 'RPN_PRE_NMS_TOP_N': 12000, 'SCALES': [600], 'SNAPSHOT_ITERS': 5000, 'SNAPSHOT_KEPT': 3, 'SNAPSHOT_PREFIX': 'res101_faster_rcnn', 'STEPSIZE': [30000], 'SUMMARY_INTERVAL': 180, 'TRIM_HEIGHT': 600, 'TRIM_WIDTH': 600, 'TRUNCATED': False, 'USE_ALL_GT': True, 'USE_FLIPPED': True, 'USE_GT': False, 'WEIGHT_DECAY': 0.0005}, 'USE_GPU_NMS': True} [New Thread 0x7fff97941700 (LWP 178421)] Loaded dataset voc_2007_trainval` for training
Set proposal method: gt
Appending horizontally-flipped training examples…
voc_2007_trainval gt roidb loaded from /research/byu2/yxzhou/faster-rcnn.pytorch/data/cache/voc_2007_trainval_gt_roidb.pkl
done
Preparing training data…
done
before filtering, there are 10022 images…
after filtering, there are 10022 images…
10022 roidb entries
[New Thread 0x7fff95abf700 (LWP 178463)]
[New Thread 0x7fff952be700 (LWP 178464)]
Loading pretrained weights from data/pretrained_model/vgg16_caffe.pth
Detaching after fork from child process 178493.
Detaching after fork from child process 178494.
Detaching after fork from child process 178495.
Detaching after fork from child process 178496.
Detaching after fork from child process 178497.
Detaching after fork from child process 178498.
Detaching after fork from child process 178499.
Detaching after fork from child process 178500.
[New Thread 0x7fff94abd700 (LWP 178501)]
[New Thread 0x7fff91fff700 (LWP 178502)]
[New Thread 0x7fff917fe700 (LWP 178503)]
[New Thread 0x7fff90ffd700 (LWP 178504)]
[New Thread 0x7fff59fff700 (LWP 178505)]
[New Thread 0x7fff597fe700 (LWP 178506)]
[New Thread 0x7fff58ffd700 (LWP 178546)]
[New Thread 0x7fff587fc700 (LWP 178586)]
[New Thread 0x7fff57ffb700 (LWP 178626)]

Program received signal SIGSEGV, Segmentation fault.
0x00007fff9dc253cc in construct<_object*, _object*> (__p=0xb, this=0x555556853298) at /usr/include/c++/4.8.2/ext/new_allocator.h:120
120 { ::new((void *)__p) _Up(std::forward<_Args>(__args)…); }
Missing separate debuginfos, use: debuginfo-install glib2-2.56.1-5.el7.x86_64 glibc-2.17-292.el7.x86_64 libICE-1.0.9-9.el7.x86_64 libSM-1.2.2-2.el7.x86_64 libX11-1.6.7-2.el7.x86_64 libXau-1.0.8-2.1.el7.x86_64 libXext-1.3.3-3.el7.x86_64 libXrender-0.9.10-1.el7.x86_64 libuuid-2.23.2-61.el7.x86_64 libxcb-1.13-1.el7.x86_64 pcre-8.32-17.el7.x86_64

（gdb）backtrace
#0 0x00007fff9dc253cc in construct<_object*, _object*> (__p=0xb, this=0x555556853298) at /usr/include/c++/4.8.2/ext/new_allocator.h:120
#1 _S_construct<_object*, _object*> (__p=0xb, __a=…) at /usr/include/c++/4.8.2/bits/alloc_traits.h:254
#2 construct<_object*, _object*> (__p=0xb, __a=…) at /usr/include/c++/4.8.2/bits/alloc_traits.h:393
#3 emplace_back<_object*> (this=0x555556853298) at /usr/include/c++/4.8.2/bits/vector.tcc:96
#4 push_back (__x=<unknown type in /research/byu2/yxzhou/faster-rcnn.pytorch/lib/model/_C.cpython-37m-x86_64-linux-gnu.so, CU 0x0, DIE 0x128976>,
this=0x555556853298) at /usr/include/c++/4.8.2/bits/stl_vector.h:920
#5 loader_life_support (this=0x7fffffffbc00) at /research/byu2/yxzhou/anaconda3/envs/faster/lib/python3.7/site-packages/torch/lib/include/pybind11/cast.h:44
#6 pybind11::cpp_function::dispatcher (self=, args_in=0x7fff95c3d960, kwargs_in=0x0)
at /research/byu2/yxzhou/anaconda3/envs/faster/lib/python3.7/site-packages/torch/lib/include/pybind11/pybind11.h:618
#7 0x00005555556b7c34 in _PyMethodDef_RawFastCallKeywords () at /tmp/build/80754af9/python_1572016129546/work/Objects/call.c:694
#8 0x00005555556b7d51 in _PyCFunction_FastCallKeywords (func=0x7fff9dccb870, args=args@entry=0x5555be71ef50, nargs=nargs@entry=3, kwnames=kwnames@entry=0x0)
at /tmp/build/80754af9/python_1572016129546/work/Objects/call.c:734
#9 0x0000555555723974 in call_function (kwnames=0x0, oparg=3, pp_stack=) at /tmp/build/80754af9/python_1572016129546/work/Python/ceval.c:4568
#10 _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1572016129546/work/Python/ceval.c:3124
#11 0x00005555556681db in function_code_fastcall (globals=, nargs=2, args=, co=0x7fff9dcfa390)
at /tmp/build/80754af9/python_1572016129546/work/Objects/call.c:283
#12 _PyFunction_FastCallDict () at /tmp/build/80754af9/python_1572016129546/work/Objects/call.c:322
#13 0x0000555555668636 in _PyObject_FastCallDict () at /tmp/build/80754af9/python_1572016129546/work/Objects/call.c:98
#14 0x0000555555686e33 in _PyObject_Call_Prepend () at /tmp/build/80754af9/python_1572016129546/work/Objects/call.c:908
#15 0x0000555555679a3e in PyObject_Call () at /tmp/build/80754af9/python_1572016129546/work/Objects/call.c:245
#16 0x000055555572112a in do_call_core (kwdict=0x7fff95fe8a00, callargs=0x7fff95b42250, func=0x7fff95ff5780)
at /tmp/build/80754af9/python_1572016129546/work/Python/ceval.c:4645