I’ve installed torchx in a venv using Python 3.12:
(venv-vllm) ~/machine-learning/vllm $ python --version
Python 3.12.9
(venv-vllm) ~/machine-learning/vllm $ pip install "torchx[dev]"
...
Running a job locally works fine:
(venv-vllm) ~/machine-learning/vllm $ torchx run --scheduler local_cwd utils.python --script random-tensor.py
torchx 2025-02-19 00:31:07 INFO loaded configs from /Users/doug/.torchxconfig
torchx 2025-02-19 00:31:07 INFO Tracker configurations: {}
torchx 2025-02-19 00:31:07 INFO Log directory not set in scheduler cfg. Creating a temporary log dir that will be deleted on exit. To preserve log directory set the `log_dir` cfg option
torchx 2025-02-19 00:31:07 INFO Log directory is: /var/folders/_v/k_8qbtg17y1dlnzj1xhclvq80000gn/T/torchx_wrof_rpa
local_cwd://torchx/torchx_utils_python-p1bgkk16231mfd
torchx 2025-02-19 00:31:07 INFO Waiting for the app to finish...
python/0 tensor([[0.0408, 0.5921, 0.6242],
python/0 [0.7283, 0.4165, 0.4160],
python/0 [0.1438, 0.3326, 0.0690],
python/0 [0.3326, 0.3140, 0.1941],
python/0 [0.4170, 0.0451, 0.5431]])
torchx 2025-02-19 00:31:09 INFO Job finished: SUCCEEDED
But trying to run it in local Docker fails with a FileNotFoundError:
(venv-vllm) ~/machine-learning/vllm $ torchx run --scheduler local_docker utils.python --script random-tensor.py
torchx 2025-02-19 00:32:27 INFO loaded configs from /Users/doug/.torchxconfig
torchx 2025-02-19 00:32:27 INFO Tracker configurations: {}
torchx 2025-02-19 00:32:27 INFO Checking for changes in workspace `file:///Users/doug/machine-learning/vllm`...
torchx 2025-02-19 00:32:27 INFO To disable workspaces pass: --workspace="" from CLI or workspace=None programmatically.
Traceback (most recent call last):
File "/Users/Shared/venv-vllm/lib/python3.12/site-packages/urllib3/connectionpool.py", line 716, in urlopen
httplib_response = self._make_request(
^^^^^^^^^^^^^^^^^^^
File "/Users/Shared/venv-vllm/lib/python3.12/site-packages/urllib3/connectionpool.py", line 416, in _make_request
conn.request(method, url, **httplib_request_kw)
...
FileNotFoundError: [Errno 2] No such file or directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/Shared/venv-vllm/lib/python3.12/site-packages/requests/adapters.py", line 667, in send
resp = conn.urlopen(
^^^^^^^^^^^^^
...
Trying to run in my Kubernetes cluster likewise gets a FileNotFoundError
(venv-vllm) ~/machine-learning/vllm $ torchx run --scheduler kubernetes --scheduler_args namespace=default,queue=default utils.echo --image alpine:latest --msg hello
torchx 2025-02-19 00:47:09 INFO loaded configs from /Users/doug/.torchxconfig
torchx 2025-02-19 00:47:09 INFO Tracker configurations: {}
torchx 2025-02-19 00:47:09 INFO Checking for changes in workspace `file:///Users/doug/machine-learning/vllm`...
torchx 2025-02-19 00:47:09 INFO To disable workspaces pass: --workspace="" from CLI or workspace=None programmatically.
Traceback (most recent call last):
File "/Users/Shared/venv-vllm/lib/python3.12/site-packages/urllib3/connectionpool.py", line 716, in urlopen
httplib_response = self._make_request(
^^^^^^^^^^^^^^^^^^^
File "/Users/Shared/venv-vllm/lib/python3.12/site-packages/urllib3/connectionpool.py", line 416, in _make_request
conn.request(method, url, **httplib_request_kw)
...
FileNotFoundError: [Errno 2] No such file or directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/Shared/venv-vllm/lib/python3.12/site-packages/requests/adapters.py", line 667, in send
resp = conn.urlopen(
^^^^^^^^^^^^^
(venv-vllm) ~/machine-learning/vllm torchx $ run --scheduler kubernetes --scheduler_args queue=test utils.python --script mps-first.py "Hello, torch"
torchx 2025-02-18 23:47:38 INFO Tracker configurations: {}
torchx 2025-02-18 23:47:38 INFO Checking for changes in workspace `file:///Users/doug/machine-learning/vllm`...
torchx 2025-02-18 23:47:38 INFO To disable workspaces pass: --workspace="" from CLI or workspace=None programmatically.
Traceback (most recent call last):
File "/Users/Shared/venv-vllm/lib/python3.12/site-packages/urllib3/connectionpool.py", line 716, in urlopen
httplib_response = self._make_request(
^^^^^^^^^^^^^^^^^^^
File "/Users/Shared/venv-vllm/lib/python3.12/site-packages/urllib3/connectionpool.py", line 416, in _make_request
conn.request(method, url, **httplib_request_kw)
...
FileNotFoundError: [Errno 2] No such file or directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/Shared/venv-vllm/lib/python3.12/site-packages/requests/adapters.py", line 667, in send
resp = conn.urlopen(
^^^^^^^^^^^^^
What should I be looking at?