when I use gdb to run the code. bt only gives “Nostack”.
This happens when I transit from ubuntu 18.04 to 22.04.4.
The following is the info it gives when “run”.
\[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[Detaching after fork from child process 10009]
[New Thread 0x7fff39fff640 (LWP 10010)]
[New Thread 0x7fff397fe640 (LWP 10011)]
[New Thread 0x7fff36ffd640 (LWP 10012)]
[New Thread 0x7fff347fc640 (LWP 10013)]
[New Thread 0x7fff2fffb640 (LWP 10014)]
[New Thread 0x7fff2d7fa640 (LWP 10015)]
[New Thread 0x7fff2aff9640 (LWP 10016)]
[New Thread 0x7fff287f8640 (LWP 10017)]
[New Thread 0x7fff25ff7640 (LWP 10018)]
[New Thread 0x7fff257f6640 (LWP 10019)]
[New Thread 0x7fff22ff5640 (LWP 10020)]
[New Thread 0x7fff207f4640 (LWP 10021)]
[New Thread 0x7fff1dff3640 (LWP 10022)]
[New Thread 0x7fff1b7f2640 (LWP 10023)]
[New Thread 0x7fff18ff1640 (LWP 10024)]
[New Thread 0x7fff167f0640 (LWP 10025)]
[New Thread 0x7fff13fef640 (LWP 10026)]
[New Thread 0x7fff117ee640 (LWP 10027)]
[New Thread 0x7fff0efed640 (LWP 10028)]
[New Thread 0x7fff0c7ec640 (LWP 10029)]
[New Thread 0x7fff09feb640 (LWP 10030)]
[New Thread 0x7fff057ea640 (LWP 10031)]
[New Thread 0x7fff02fe9640 (LWP 10032)]
[New Thread 0x7fff027e8640 (LWP 10033)]
[New Thread 0x7ffefffe7640 (LWP 10034)]
[New Thread 0x7ffefd7e6640 (LWP 10035)]
[New Thread 0x7ffefafe5640 (LWP 10036)]
[New Thread 0x7ffef67e4640 (LWP 10037)]
[New Thread 0x7ffef5fe3640 (LWP 10038)]
[New Thread 0x7ffef17e2640 (LWP 10039)]
[New Thread 0x7ffeeefe1640 (LWP 10040)]
[New Thread 0x7ffee973b640 (LWP 10041)]
[New Thread 0x7ffedbf3cac0 (LWP 10042)]
[New Thread 0x7ffedb73ab40 (LWP 10043)]
[New Thread 0x7ffedaf38bc0 (LWP 10044)]
[New Thread 0x7fff287f8640 (LWP 10017)]
[New Thread 0x7fff25ff7640 (LWP 10018)]
[New Thread 0x7fff257f6640 (LWP 10019)]
[New Thread 0x7fff22ff5640 (LWP 10020)]
[New Thread 0x7fff207f4640 (LWP 10021)]
[New Thread 0x7fff1dff3640 (LWP 10022)]
[New Thread 0x7fff1b7f2640 (LWP 10023)]
[New Thread 0x7fff18ff1640 (LWP 10024)]
[New Thread 0x7fff167f0640 (LWP 10025)]
[New Thread 0x7fff13fef640 (LWP 10026)]
[New Thread 0x7fff117ee640 (LWP 10027)]
[New Thread 0x7fff0efed640 (LWP 10028)]
[New Thread 0x7fff0c7ec640 (LWP 10029)]
[New Thread 0x7fff09feb640 (LWP 10030)]
[New Thread 0x7fff057ea640 (LWP 10031)]
[New Thread 0x7fff02fe9640 (LWP 10032)]
[New Thread 0x7fff027e8640 (LWP 10033)]
[New Thread 0x7ffefffe7640 (LWP 10034)]
[New Thread 0x7ffefd7e6640 (LWP 10035)]
[New Thread 0x7ffefafe5640 (LWP 10036)]
[New Thread 0x7ffef67e4640 (LWP 10037)]
[New Thread 0x7ffef5fe3640 (LWP 10038)]
[New Thread 0x7ffef17e2640 (LWP 10039)]
[New Thread 0x7ffeeefe1640 (LWP 10040)]
[New Thread 0x7ffee973b640 (LWP 10041)]
[New Thread 0x7ffedbf3cac0 (LWP 10042)]
[New Thread 0x7ffedb73ab40 (LWP 10043)]
[New Thread 0x7ffedaf38bc0 (LWP 10044)]
[New Thread 0x7ffeda736c40 (LWP 10045)]
[New Thread 0x7ffed9f34cc0 (LWP 10046)]
[New Thread 0x7ffed9732d40 (LWP 10047)]
[New Thread 0x7ffed8f30dc0 (LWP 10048)]
[New Thread 0x7ffed3ffee40 (LWP 10049)]
[New Thread 0x7ffed37fcec0 (LWP 10050)]
[New Thread 0x7ffed2ffaf40 (LWP 10051)]
[New Thread 0x7ffed27f8fc0 (LWP 10052)]
[New Thread 0x7ffed1ff7040 (LWP 10053)]
[New Thread 0x7ffed17f50c0 (LWP 10054)]
[New Thread 0x7ffed0ff3140 (LWP 10055)]
[New Thread 0x7ffe8ffff1c0 (LWP 10056)]
[New Thread 0x7ffe8f7fd240 (LWP 10057)]
[New Thread 0x7ffe8effb2c0 (LWP 10058)]
[New Thread 0x7ffe8e7f9340 (LWP 10059)]
[New Thread 0x7ffe8dff73c0 (LWP 10060)]
[New Thread 0x7ffe8d7f5440 (LWP 10061)]
[New Thread 0x7ffe8cff34c0 (LWP 10062)]
[New Thread 0x7ffe6bfff540 (LWP 10063)]
[New Thread 0x7ffe63fff5c0 (LWP 10064)]
[Thread 0x7ffeeefe1640 (LWP 10040) exited]
[Thread 0x7ffef17e2640 (LWP 10039) exited]
[Thread 0x7ffef5fe3640 (LWP 10038) exited]
[Thread 0x7ffef67e4640 (LWP 10037) exited]
[Thread 0x7ffefafe5640 (LWP 10036) exited]
[Thread 0x7ffefd7e6640 (LWP 10035) exited]
[Thread 0x7ffefffe7640 (LWP 10034) exited]
[Thread 0x7fff027e8640 (LWP 10033) exited]
[Thread 0x7fff02fe9640 (LWP 10032) exited]
[Thread 0x7fff057ea640 (LWP 10031) exited]
[Thread 0x7fff09feb640 (LWP 10030) exited]
[Thread 0x7fff0c7ec640 (LWP 10029) exited]
[Thread 0x7fff0efed640 (LWP 10028) exited]
[Thread 0x7fff117ee640 (LWP 10027) exited]
[Thread 0x7fff13fef640 (LWP 10026) exited]
[Thread 0x7fff167f0640 (LWP 10025) exited]
[Thread 0x7fff18ff1640 (LWP 10024) exited]
[Thread 0x7fff1b7f2640 (LWP 10023) exited]
[Thread 0x7fff1dff3640 (LWP 10022) exited]
[Thread 0x7fff207f4640 (LWP 10021) exited]
[Thread 0x7fff22ff5640 (LWP 10020) exited]
[Thread 0x7fff257f6640 (LWP 10019) exited]
[Thread 0x7fff25ff7640 (LWP 10018) exited]
[Thread 0x7fff287f8640 (LWP 10017) exited]
[Thread 0x7fff2aff9640 (LWP 10016) exited]
[Thread 0x7fff2d7fa640 (LWP 10015) exited]
[Thread 0x7fff2fffb640 (LWP 10014) exited]
[Thread 0x7fff347fc640 (LWP 10013) exited]
[Thread 0x7fff36ffd640 (LWP 10012) exited]
[Thread 0x7fff397fe640 (LWP 10011) exited]
[Thread 0x7fff39fff640 (LWP 10010) exited]
[Detaching after fork from child process 10065]
[Detaching after fork from child process 10066]
[New Thread 0x7ffeeefe1640 (LWP 10067)]
[New Thread 0x7ffef17e2640 (LWP 10068)]
Converting model to device: cuda:1
Param Count: 11.070111 M
current epoch: 0
[Detaching after fork from child process 10069]
[Detaching after fork from child process 10101]
[Detaching after fork from child process 10102]
[Detaching after fork from child process 10165]
[Detaching after fork from child process 10197]
[Detaching after fork from child process 10229]
[Detaching after fork from child process 10261]
[Detaching after fork from child process 10293]
[New Thread 0x7ffef5fe3640 (LWP 10294)]
[New Thread 0x7ffef67e4640 (LWP 10326)]
[New Thread 0x7fff39dbf640 (LWP 10327)]
[New Thread 0x7fff36ffd640 (LWP 10328)]
[New Thread 0x7fff347fc640 (LWP 10329)]
[New Thread 0x7fff2fffb640 (LWP 10330)]
[New Thread 0x7fff2d7fa640 (LWP 10331)]
[New Thread 0x7fff2aff9640 (LWP 10332)]
[New Thread 0x7fff287f8640 (LWP 10333)]
0%| | 0/313 [00:00<?, ?it/s]
ERROR: Unexpected segmentation fault encountered in worker.
0%| | 0/313 [00:04<?, ?it/s]
Traceback (most recent call last):
File "/home/wzh/.conda/envs/retro/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1133, in _try_get_data
data = self._data_queue.get(timeout=timeout)
File "/home/wzh/.conda/envs/retro/lib/python3.8/multiprocessing/queues.py", line 107, in get
if not self._poll(timeout):
File "/home/wzh/.conda/envs/retro/lib/python3.8/multiprocessing/connection.py", line 257, in poll
return self._poll(timeout)
File "/home/wzh/.conda/envs/retro/lib/python3.8/multiprocessing/connection.py", line 424, in _poll
r = wait([self], timeout)
File "/home/wzh/.conda/envs/retro/lib/python3.8/multiprocessing/connection.py", line 931, in wait
ready = selector.select(timeout)
File "/home/wzh/.conda/envs/retro/lib/python3.8/selectors.py", line 415, in select
fd_event_list = self._selector.poll(timeout)
File "/home/wzh/.conda/envs/retro/lib/python3.8/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 10261) is killed by signal: Segmentation fault.
Traceback (most recent call last):
File "train_clip.py", line 330, in <module>
main(args)
File "train_clip.py", line 207, in main
train_loss, train_acc = train_epoch(
File "train_clip.py", line 75, in train_epoch
for batch_id, batch_data in pbar:
File "/home/wzh/.conda/envs/retro/lib/python3.8/site-packages/tqdm/std.py", line 1181, in __iter__
for obj in iterable:
File "/home/wzh/.conda/envs/retro/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 631, in __next__
data = self._next_data()
File "/home/wzh/.conda/envs/retro/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1329, in _next_data
idx, data = self._get_data()
File "/home/wzh/.conda/envs/retro/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1295, in _get_data
success, data = self._try_get_data()
File "/home/wzh/.conda/envs/retro/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1146, in _try_get_data
raise RuntimeError(f'DataLoader worker (pid(s) {pids_str}) exited unexpectedly') from e
RuntimeError: DataLoader worker (pid(s) 10261) exited unexpectedly
The following is the env information:
ollecting environment information...
PyTorch version: 2.2.1
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.4 LTS (x86_64)
GCC version: (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.35
Python version: 3.8.18 (default, Sep 11 2023, 13:40:15) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-6.5.0-26-generic-x86_64-with-glibc2.17
Is CUDA available: True
CUDA runtime version: 12.4.99
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA GeForce RTX 3090 Ti
GPU 1: NVIDIA GeForce RTX 3080
Nvidia driver version: 550.67
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Vendor ID: GenuineIntel
Model name: Intel(R) Core(TM) i9-14900K
CPU family: 6
Model: 183
Thread(s) per core: 2
Core(s) per socket: 24
Socket(s): 1
Stepping: 1
CPU max MHz: 6000.0000
CPU min MHz: 800.0000
BogoMIPS: 6374.40
Versions of relevant libraries:
[pip3] fcd-torch==1.0.7
[pip3] numpy==1.24.3
[pip3] torch==2.2.1
[pip3] torch_geometric==2.5.2
[pip3] torch-metrics==1.1.7
[pip3] torchaudio==2.2.1
[pip3] torchmetrics==1.3.2
[pip3] torchvision==0.17.1
[pip3] triton==2.2.0
[conda] blas 1.0 mkl defaults
[conda] fcd-torch 1.0.7 pypi_0 pypi
[conda] ffmpeg 4.3 hf484d3e_0 pytorch
[conda] libjpeg-turbo 2.0.0 h9bf148f_0 pytorch
[conda] mkl 2023.1.0 h213fc3f_46344 defaults
[conda] mkl-service 2.4.0 py38h5eee18b_1 defaults
[conda] mkl_fft 1.3.8 py38h5eee18b_0 defaults
[conda] mkl_random 1.2.4 py38hdb19cb5_0 defaults
[conda] numpy 1.24.3 py38hf6e8229_1 defaults
[conda] numpy-base 1.24.3 py38h060ed82_1 defaults
[conda] pytorch 2.2.1 py3.8_cuda12.1_cudnn8.9.2_0 pytorch
[conda] pytorch-cuda 12.1 ha16c6d3_5 pytorch
[conda] pytorch-mutex 1.0 cuda pytorch
[conda] torch-geometric 2.5.2 pypi_0 pypi
[conda] torch-metrics 1.1.7 pypi_0 pypi
[conda] torchaudio 2.2.1 py38_cu121 pytorch
[conda] torchmetrics 1.3.2 pypi_0 pypi
[conda] torchtriton 2.2.0 py38 pytorch
[conda] torchvision 0.17.1 py38_cu121 pytorch
Is there any conflict between ubuntu 22.04 and the latest pytorch. The reason why I need to use 22.04 is that it contains the Network driver of the ASUS z790-e motherboard.