Pytorch training crash with coredump

pystack core --native-all /root/code/emotion_lip_gen/core

Using executable found in the core file: /usr/bin/python

Core file information:
state: R zombie: True niceness: 0
pid: 5277 ppid: 5276 sid: 31
uid: 0 gid: 0 pgrp: 5276
executable: python arguments: python train_syncnet.py --save_dir all_hubert_15 --dataset syncnet --project sy

The process died due receiving signal SIGABRT
Traceback for thread 5277 [Has the GIL] (most recent call last):
(C) File “???”, line 0, in _start (python)
(C) File “…/csu/libc-start.c”, line 392, in __libc_start_main@@GLIBC_2.34 (libc.so.6)
(C) File “…/sysdeps/nptl/libc_start_call_main.h”, line 58, in __libc_start_call_main (libc.so.6)
(C) File “???”, line 0, in Py_BytesMain (python)
(C) File “???”, line 0, in Py_RunMain (python)
(C) File “???”, line 0, in Py_FinalizeEx (python)
(C) File “???”, line 0, in pybind11_object_dealloc (libtorch_python.so)
(C) File “???”, line 0, in pybind11::detail::clear_instance(object*) (libtorch_python.so)
(C) File “???”, line 0, in pybind11::class
<c10d::Reducer, std::shared_ptrc10d::Reducer >::dealloc(pybind11::detail::value_and_holder&) (libtorch_python.so)
(C) File “???”, line 0, in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (libtorch_python.so)
(C) File “???”, line 0, in std::_Sp_counted_ptr<c10d::Reducer*, (__gnu_cxx::_Lock_policy)2>::M_dispose() (libtorch_python.so)
(C) File “???”, line 0, in c10d::Reducer::~Reducer() (libtorch_cpu.so)
(C) File “???”, line 0, in c10::TensorImpl::~TensorImpl() (libc10.so)
(C) File “???”, line 0, in c10::TensorImpl::~TensorImpl() (libc10.so)
(C) File “???”, line 0, in c10::intrusive_ptr<c10::StorageImpl, c10::detail::intrusive_target_default_null_typec10::StorageImpl >::reset
() (libc10.so)
(C) File “???”, line 0, in c10::StorageImpl::~StorageImpl() (libtorch_python.so)
(C) File “???”, line 0, in _Unwind_Resume (libgcc_s.so.1)
(C) File “???”, line 0, in __gxx_personality_v0 (libstdc++.so.6)
(C) File “./stdlib/abort.c”, line 79, in abort (libc.so.6)
(C) File “…/sysdeps/posix/raise.c”, line 26, in raise (libc.so.6)
(C) File “./nptl/pthread_kill.c”, line 89, in pthread_kill@@GLIBC_2.34 (libc.so.6)
(C) File “./nptl/pthread_kill.c”, line 78, in __pthread_kill_internal (inlined) (libc.so.6)
(C) File “./nptl/pthread_kill.c”, line 44, in __pthread_kill_implementation (inlined) (libc.so.6)

Traceback for thread 5408 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5404 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5405 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5406 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5411 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5410 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5409 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5423 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5414 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5412 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5415 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5413 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5424 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5418 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5429 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5420 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5417 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5421 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5422 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5425 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5427 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5437 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5432 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5434 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5436 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5435 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5441 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5439 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5445 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5442 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5440 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5428 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5444 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5447 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5450 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5449 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5448 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5451 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5454 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5456 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5446 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5455 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5452 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5416 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5459 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5457 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5460 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5461 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5463 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5465 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5675 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)
(C) File “???”, line 0, in PyThread_acquire_lock_timed (python)
(C) File “./nptl/sem_waitcommon.c”, line 183, in __new_sem_wait_slow64 (libc.so.6)
(C) File “./nptl/futex-internal.c”, line 139, in __GI___futex_abstimed_wait_cancelable64 (libc.so.6)
(C) File “./nptl/futex-internal.c”, line 87, in __futex_abstimed_wait_common (inlined) (libc.so.6)
(C) File “./nptl/futex-internal.c”, line 57, in __futex_abstimed_wait_common64 (inlined) (libc.so.6)

Traceback for thread 5419 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5462 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5426 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5433 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5430 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5661 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)
(C) File “…/sysdeps/unix/sysv/linux/poll.c”, line 29, in __poll (libc.so.6)

Traceback for thread 6334 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)
(C) File “???”, line 0, in torch::autograd::python::PythonEngine::thread_init(int, std::shared_ptrtorch::autograd::ReadyQueue const&, bool) (libtorch_python.so)
(C) File “???”, line 0, in torch::autograd::Engine::thread_init(int, std::shared_ptrtorch::autograd::ReadyQueue const&, bool) (libtorch_cpu.so)
(C) File “???”, line 0, in torch::autograd::Engine::thread_main(std::shared_ptrtorch::autograd::GraphTask const&) (libtorch_cpu.so)
(C) File “???”, line 0, in torch::autograd::ReadyQueue::pop() (libtorch_cpu.so)
(C) File “???”, line 0, in std::condition_variable::wait(std::unique_lockstd::mutex&) (libstdc++.so.6)
(C) File “./nptl/pthread_cond_wait.c”, line 627, in pthread_cond_wait@@GLIBC_2.3.2 (libc.so.6)
(C) File “./nptl/pthread_cond_wait.c”, line 503, in __pthread_cond_wait_common (inlined) (libc.so.6)
(C) File “./nptl/futex-internal.c”, line 139, in __GI___futex_abstimed_wait_cancelable64 (libc.so.6)
(C) File “./nptl/futex-internal.c”, line 87, in __futex_abstimed_wait_common (inlined) (libc.so.6)
(C) File “./nptl/futex-internal.c”, line 57, in __futex_abstimed_wait_common64 (inlined) (libc.so.6)

Traceback for thread 5468 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)
(C) File “???”, line 0, in c10d::detail::(anonymous namespace)::TCPStoreMasterDaemon::run() (libtorch_cpu.so)
(C) File “…/sysdeps/unix/sysv/linux/poll.c”, line 29, in __poll (libc.so.6)

Traceback for thread 5453 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5673 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)
(C) File “/pytorch/third_party/nccl/nccl/src/proxy.cc”, line 692, in ncclProxyProgress(void*) (libtorch_cuda.so)
(C) File “/pytorch/third_party/nccl/nccl/src/proxy.cc”, line 552, in ncclProxyGetPostedOps (inlined) (libtorch_cuda.so)
(C) File “./nptl/pthread_cond_wait.c”, line 627, in pthread_cond_wait@@GLIBC_2.3.2 (libc.so.6)
(C) File “./nptl/pthread_cond_wait.c”, line 503, in __pthread_cond_wait_common (inlined) (libc.so.6)
(C) File “./nptl/futex-internal.c”, line 139, in __GI___futex_abstimed_wait_cancelable64 (libc.so.6)
(C) File “./nptl/futex-internal.c”, line 87, in __futex_abstimed_wait_common (inlined) (libc.so.6)
(C) File “./nptl/futex-internal.c”, line 57, in __futex_abstimed_wait_common64 (inlined) (libc.so.6)

Traceback for thread 5458 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 6335 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)
(C) File “???”, line 0, in torch::autograd::python::PythonEngine::thread_init(int, std::shared_ptrtorch::autograd::ReadyQueue const&, bool) (libtorch_python.so)
(C) File “???”, line 0, in torch::autograd::Engine::thread_init(int, std::shared_ptrtorch::autograd::ReadyQueue const&, bool) (libtorch_cpu.so)
(C) File “???”, line 0, in torch::autograd::Engine::thread_main(std::shared_ptrtorch::autograd::GraphTask const&) (libtorch_cpu.so)
(C) File “???”, line 0, in torch::autograd::ReadyQueue::pop() (libtorch_cpu.so)
(C) File “???”, line 0, in std::condition_variable::wait(std::unique_lockstd::mutex&) (libstdc++.so.6)
(C) File “./nptl/pthread_cond_wait.c”, line 627, in pthread_cond_wait@@GLIBC_2.3.2 (libc.so.6)
(C) File “./nptl/pthread_cond_wait.c”, line 503, in __pthread_cond_wait_common (inlined) (libc.so.6)
(C) File “./nptl/futex-internal.c”, line 139, in __GI___futex_abstimed_wait_cancelable64 (libc.so.6)
(C) File “./nptl/futex-internal.c”, line 87, in __futex_abstimed_wait_common (inlined) (libc.so.6)
(C) File “./nptl/futex-internal.c”, line 57, in __futex_abstimed_wait_common64 (inlined) (libc.so.6)

Traceback for thread 5464 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5431 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5672 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)
(C) File “/pytorch/third_party/nccl/nccl/src/proxy.cc”, line 1058, in ncclProxyService(void*) (libtorch_cuda.so)
(C) File “…/sysdeps/unix/sysv/linux/poll.c”, line 29, in __poll (libc.so.6)

Traceback for thread 5438 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 6336 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)
(C) File “???”, line 0, in PyThread_acquire_lock_timed (python)
(C) File “./nptl/sem_waitcommon.c”, line 183, in __new_sem_wait_slow64 (libc.so.6)
(C) File “./nptl/futex-internal.c”, line 139, in __GI___futex_abstimed_wait_cancelable64 (libc.so.6)
(C) File “./nptl/futex-internal.c”, line 87, in __futex_abstimed_wait_common (inlined) (libc.so.6)
(C) File “./nptl/futex-internal.c”, line 57, in __futex_abstimed_wait_common64 (inlined) (libc.so.6)

Traceback for thread 5443 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5466 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5666 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)
(C) File “./nptl/pthread_cond_wait.c”, line 652, in pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6)
(C) File “./nptl/pthread_cond_wait.c”, line 503, in __pthread_cond_wait_common (inlined) (libc.so.6)
(C) File “./nptl/futex-internal.c”, line 139, in __GI___futex_abstimed_wait_cancelable64 (libc.so.6)
(C) File “./nptl/futex-internal.c”, line 87, in __futex_abstimed_wait_common (inlined) (libc.so.6)
(C) File “./nptl/futex-internal.c”, line 57, in __futex_abstimed_wait_common64 (inlined) (libc.so.6)

Traceback for thread 6338 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)
(C) File “???”, line 0, in PyThread_acquire_lock_timed (python)
(C) File “./nptl/sem_waitcommon.c”, line 183, in __new_sem_wait_slow64 (libc.so.6)
(C) File “./nptl/futex-internal.c”, line 139, in __GI___futex_abstimed_wait_cancelable64 (libc.so.6)
(C) File “./nptl/futex-internal.c”, line 87, in __futex_abstimed_wait_common (inlined) (libc.so.6)
(C) File “./nptl/futex-internal.c”, line 57, in __futex_abstimed_wait_common64 (inlined) (libc.so.6)

Traceback for thread 5407 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)

Traceback for thread 5663 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)
(C) File “???”, line 0, in c10d::ProcessGroupNCCL::ncclCommWatchdog() (libtorch_cuda.so)
(C) File “???”, line 0, in c10d::ProcessGroupNCCL::ncclCommWatchdogInternal() (libtorch_cuda.so)
(C) File “./nptl/pthread_cond_wait.c”, line 652, in pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6)
(C) File “./nptl/pthread_cond_wait.c”, line 503, in __pthread_cond_wait_common (inlined) (libc.so.6)
(C) File “./nptl/futex-internal.c”, line 139, in __GI___futex_abstimed_wait_cancelable64 (libc.so.6)
(C) File “./nptl/futex-internal.c”, line 87, in __futex_abstimed_wait_common (inlined) (libc.so.6)
(C) File “./nptl/futex-internal.c”, line 57, in __futex_abstimed_wait_common64 (inlined) (libc.so.6)

Traceback for thread 5664 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)
(C) File “…/sysdeps/unix/sysv/linux/poll.c”, line 29, in __poll (libc.so.6)

Traceback for thread 5659 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)
(C) File “???”, line 0, in c10d::detail::(anonymous namespace)::TCPStoreWorkerDaemon::run() (libtorch_cpu.so)
(C) File “…/sysdeps/unix/sysv/linux/poll.c”, line 29, in __poll (libc.so.6)

Traceback for thread 6613 (most recent call last):
(C) File “…/sysdeps/unix/sysv/linux/x86_64/clone3.S”, line 81, in __clone3 (libc.so.6)
(C) File “./nptl/pthread_create.c”, line 442, in start_thread (libc.so.6)
(C) File “…/sysdeps/unix/sysv/linux/poll.c”, line 29, in __poll (libc.so.6)