@mrshenli How to catch exceptions rpc.async
, rpc.sync
and rpc.remote
thrown in the caller under the following conditions, suppose a timeout is set globally (or per call):
- during execution, the target process crashes and exits, also closing down all rpc execution threads.
- during execution, connection to the target process is closed
- during execution, the timeout limit is reached
- during execution, an exception is raised in the executed function
Based on my experiments, my partial answer is:
- Not known ?
- A
RuntimeError
, something like “peer reset” - An uncatchable
std::runtime_error
, something like:
terminate called after throwing an instance of 'std::runtime_error'
what(): RPC ran for more than 5000 milliseconds and timed out.
- the exception thrown by the function, not the original exception, but wrapped in a udf exception and reraised on the caller side.
The third one troubles me the most because std::runtime_error
will cause an ugly Fatal Python Error
:
Fatal Python error: Aborted
Thread 0x00007f916abab700 (most recent call first):
File "/home/Administrator/iffi/Projects/machin/machin/parallel/distributed/world.py", line 63 in _rpc_call_remote_method
File "/home/Administrator/iffi/Projects/machin/venv/lib/python3.5/site-packages/torch/distributed/rpc/internal.py", line 153 in _run_function
Thread 0x00007f91693a8700 (most recent call first):
File "/home/Administrator/iffi/Projects/machin/machin/parallel/distributed/world.py", line 75 in _rpc_get_remote_paired_value
File "/home/Administrator/iffi/Projects/machin/venv/lib/python3.5/site-packages/torch/distributed/rpc/internal.py", line 153 in _run_function
Thread 0x00007f9163fff700 (most recent call first):
File "/home/Administrator/iffi/Projects/machin/machin/parallel/distributed/world.py", line 75 in _rpc_get_remote_paired_value
File "/home/Administrator/iffi/Projects/machin/venv/lib/python3.5/site-packages/torch/distributed/rpc/internal.py", line 153 in _run_function
Thread 0x00007f91527fc700 (most recent call first):
File "/home/Administrator/iffi/Projects/machin/venv/lib/python3.5/site-packages/torch/distributed/rpc/api.py", line 554 in rpc_sync
File "/home/Administrator/iffi/Projects/machin/venv/lib/python3.5/site-packages/torch/distributed/rpc/api.py", line 77 in wrapper
File "/home/Administrator/iffi/Projects/machin/machin/parallel/distributed/world.py", line 756 in _rpc_paired_class_call
File "/home/Administrator/iffi/Projects/machin/machin/parallel/distributed/world.py", line 597 in rpc_paired_class_sync
File "/home/Administrator/iffi/Projects/machin/test/parallel/distributed/test_world.py", line 97 in main
File "/home/Administrator/iffi/Projects/machin/machin/parallel/distributed/world.py", line 46 in _exec_role
File "/usr/lib/python3.5/threading.py", line 862 in run
File "/home/Administrator/iffi/Projects/machin/machin/parallel/thread.py", line 47 in run
File "/usr/lib/python3.5/threading.py", line 914 in _bootstrap_inner
File "/usr/lib/python3.5/threading.py", line 882 in _bootstrap
Thread 0x00007f9152ffd700 (most recent call first):
File "/home/Administrator/iffi/Projects/machin/machin/parallel/distributed/election.py", line 423 in _task_timeout
File "/usr/lib/python3.5/threading.py", line 862 in run
File "/home/Administrator/iffi/Projects/machin/machin/parallel/thread.py", line 47 in run
File "/usr/lib/python3.5/threading.py", line 914 in _bootstrap_inner
File "/usr/lib/python3.5/threading.py", line 882 in _bootstrap
Thread 0x00007f91537fe700 (most recent call first):
File "/home/Administrator/iffi/Projects/machin/machin/parallel/distributed/election.py", line 435 in _task_keep_alive
File "/usr/lib/python3.5/threading.py", line 862 in run
File "/home/Administrator/iffi/Projects/machin/machin/parallel/thread.py", line 47 in run
File "/usr/lib/python3.5/threading.py", line 914 in _bootstrap_inner
File "/usr/lib/python3.5/threading.py", line 882 in _bootstrap
Thread 0x00007f9153fff700 (most recent call first):
File "/usr/lib/python3.5/threading.py", line 297 in wait
File "/usr/lib/python3.5/queue.py", line 173 in get
File "/home/Administrator/iffi/Projects/machin/machin/parallel/distributed/election.py", line 491 in _task_handle
File "/usr/lib/python3.5/threading.py", line 862 in run
File "/home/Administrator/iffi/Projects/machin/machin/parallel/thread.py", line 47 in run
File "/usr/lib/python3.5/threading.py", line 914 in _bootstrap_inner
File "/usr/lib/python3.5/threading.py", line 882 in _bootstrap
Thread 0x00007f9160ff9700 (most recent call first):
File "/usr/lib/python3.5/threading.py", line 293 in wait
File "/home/Administrator/iffi/Projects/machin/machin/parallel/event.py", line 66 in wait
File "/home/Administrator/iffi/Projects/machin/machin/parallel/distributed/role_dispatcher.py", line 234 in _task_dispatch
File "/usr/lib/python3.5/threading.py", line 862 in run
File "/home/Administrator/iffi/Projects/machin/machin/parallel/thread.py", line 47 in run
File "/usr/lib/python3.5/threading.py", line 914 in _bootstrap_inner
File "/usr/lib/python3.5/threading.py", line 882 in _bootstrap
Thread 0x00007f91617fa700 (most recent call first):
File "/usr/lib/python3.5/threading.py", line 293 in wait
File "/home/Administrator/iffi/Projects/machin/machin/parallel/event.py", line 66 in wait
File "/home/Administrator/iffi/Projects/machin/machin/parallel/distributed/world.py", line 302 in _task_run_dispatched_roles
File "/usr/lib/python3.5/threading.py", line 862 in run
File "/home/Administrator/iffi/Projects/machin/machin/parallel/thread.py", line 47 in run
File "/usr/lib/python3.5/threading.py", line 914 in _bootstrap_inner
File "/usr/lib/python3.5/threading.py", line 882 in _bootstrap
Thread 0x00007f91e4362700 (most recent call first):
File "/home/Administrator/iffi/Projects/machin/test/parallel/distributed/test_world.py", line 145 in subproc_start_world_with_roles
File "/home/Administrator/iffi/Projects/machin/test/parallel/util_run_multi.py", line 16 in process_main
File "/usr/lib/python3.5/multiprocessing/process.py", line 93 in run
File "/home/Administrator/iffi/Projects/machin/machin/parallel/process.py", line 52 in run
File "/usr/lib/python3.5/multiprocessing/process.py", line 249 in _bootstrap
File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 74 in _launch
File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 20 in __init__
File "/usr/lib/python3.5/multiprocessing/context.py", line 267 in _Popen
File "/home/Administrator/iffi/Projects/machin/machin/parallel/process.py", line 25 in _Popen
File "/usr/lib/python3.5/multiprocessing/process.py", line 105 in start
File "/home/Administrator/iffi/Projects/machin/test/parallel/util_run_multi.py", line 27 in processes
File "/home/Administrator/iffi/Projects/machin/venv/lib/python3.5/site-packages/_pytest/fixtures.py", line 788 in call_fixture_func
File "/home/Administrator/iffi/Projects/machin/venv/lib/python3.5/site-packages/_pytest/fixtures.py", line 964 in pytest_fixture_setup
File "/home/Administrator/iffi/Projects/machin/venv/lib/python3.5/site-packages/pluggy/callers.py", line 187 in _multicall
File "/home/Administrator/iffi/Projects/machin/venv/lib/python3.5/site-packages/pluggy/manager.py", line 87 in <lambda>
File "/home/Administrator/iffi/Projects/machin/venv/lib/python3.5/site-packages/pluggy/manager.py", line 93 in _hookexec
File "/home/Administrator/iffi/Projects/machin/venv/lib/python3.5/site-packages/pluggy/hooks.py", line 286 in __call__
File "/home/Administrator/iffi/Projects/machin/venv/lib/python3.5/site-packages/_pytest/fixtures.py", line 914 in execute
File "/home/Administrator/iffi/Projects/machin/venv/lib/python3.5/site-packages/_pytest/fixtures.py", line 584 in _compute_fixture_value
File "/home/Administrator/iffi/Projects/machin/venv/lib/python3.5/site-packages/_pytest/fixtures.py", line 503 in _get_active_fixturedef
File "/home/Administrator/iffi/Projects/machin/venv/lib/python3.5/site-packages/_pytest/fixtures.py", line 487 in getfixturevalue
File "/home/Administrator/iffi/Projects/machin/venv/lib/python3.5/site-packages/_pytest/fixtures.py", line 477 in _fillfixtures
File "/home/Administrator/iffi/Projects/machin/venv/lib/python3.5/site-packages/_pytest/fixtures.py", line 297 in fillfixtures
File "/home/Administrator/iffi/Projects/machin/venv/lib/python3.5/site-packages/_pytest/python.py", line 1483 in setup
File "/home/Administrator/iffi/Projects/machin/venv/lib/python3.5/site-packages/_pytest/runner.py", line 373 in prepare
File "/home/Administrator/iffi/Projects/machin/venv/lib/python3.5/site-packages/_pytest/runner.py", line 123 in pytest_runtest_setup
File "/home/Administrator/iffi/Projects/machin/venv/lib/python3.5/site-packages/pluggy/callers.py", line 187 in _multicall
File "/home/Administrator/iffi/Projects/machin/venv/lib/python3.5/site-packages/pluggy/manager.py", line 87 in <lambda>
File "/home/Administrator/iffi/Projects/machin/venv/lib/python3.5/site-packages/pluggy/manager.py", line 93 in _hookexec
File "/home/Administrator/iffi/Projects/machin/venv/lib/python3.5/site-packages/pluggy/hooks.py", line 286 in __call__
File "/home/Administrator/iffi/Projects/machin/venv/lib/python3.5/site-packages/_pytest/runner.py", line 217 in <lambda>
File "/home/Administrator/iffi/Projects/machin/venv/lib/python3.5/site-packages/_pytest/runner.py", line 244 in from_call
File "/home/Administrator/iffi/Projects/machin/venv/lib/python3.5/site-packages/_pytest/runner.py", line 217 in call_runtest_hook
File "/home/Administrator/iffi/Projects/machin/venv/lib/python3.5/site-packages/_pytest/runner.py", line 186 in call_and_report
File "/home/Administrator/iffi/Projects/machin/venv/lib/python3.5/site-packages/_pytest/runner.py", line 94 in runtestprotocol
File "/home/Administrator/iffi/Projects/machin/venv/lib/python3.5/site-packages/_pytest/runner.py", line 85 in pytest_runtest_protocol
File "/home/Administrator/iffi/Projects/machin/venv/lib/python3.5/site-packages/pluggy/callers.py", line 187 in _multicall
File "/home/Administrator/iffi/Projects/machin/venv/lib/python3.5/site-packages/pluggy/manager.py", line 87 in <lambda>
File "/home/Administrator/iffi/Projects/machin/venv/lib/python3.5/site-packages/pluggy/manager.py", line 93 in _hookexec
File "/home/Administrator/iffi/Projects/machin/venv/lib/python3.5/site-packages/pluggy/hooks.py", line 286 in __call__
File "/home/Administrator/iffi/Projects/machin/venv/lib/python3.5/site-packages/_pytest/main.py", line 272 in pytest_runtestloop
File "/home/Administrator/iffi/Projects/machin/venv/lib/python3.5/site-packages/pluggy/callers.py", line 187 in _multicall
File "/home/Administrator/iffi/Projects/machin/venv/lib/python3.5/site-packages/pluggy/manager.py", line 87 in <lambda>
File "/home/Administrator/iffi/Projects/machin/venv/lib/python3.5/site-packages/pluggy/manager.py", line 93 in _hookexec
File "/home/Administrator/iffi/Projects/machin/venv/lib/python3.5/site-packages/pluggy/hooks.py", line 286 in __call__
File "/home/Administrator/iffi/Projects/machin/venv/lib/python3.5/site-packages/_pytest/main.py", line 247 in _main
File "/home/Administrator/iffi/Projects/machin/venv/lib/python3.5/site-packages/_pytest/main.py", line 191 in wrap_session
File "/home/Administrator/iffi/Projects/machin/venv/lib/python3.5/site-packages/_pytest/main.py", line 240 in pytest_cmdline_main
File "/home/Administrator/iffi/Projects/machin/venv/lib/python3.5/site-packages/pluggy/callers.py", line 187 in _multicall
File "/home/Administrator/iffi/Projects/machin/venv/lib/python3.5/site-packages/pluggy/manager.py", line 87 in <lambda>
File "/home/Administrator/iffi/Projects/machin/venv/lib/python3.5/site-packages/pluggy/manager.py", line 93 in _hookexec
File "/home/Administrator/iffi/Projects/machin/venv/lib/python3.5/site-packages/pluggy/hooks.py", line 286 in __call__
File "/home/Administrator/iffi/Projects/machin/venv/lib/python3.5/site-packages/_pytest/config/__init__.py", line 125 in main
File "/data/software/pycharm/pycharm-2020.1.2/plugins/python/helpers/pycharm/_jb_pytest_runner.py", line 43 in <module>
Is there any clean way to deal with the first three conditions? The fourth one is simple. And why pybind11 is not converting the third std::runtime_error
to a catchable python RuntimeError
?