Can't use PyTorch from latest docker image

Diego · September 14, 2022, 2:05pm

When I try to run PyTorch using the latest docker image (nvcr.io/nvidia/pytorch:22.08-py3) it breaks on import torch giving the following stacktrace:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/mnt/home/.local/lib/python3.8/site-packages/torch/__init__.py", line 811, in <module>
    from .functional import *  # noqa: F403
  File "/mnt/home/.local/lib/python3.8/site-packages/torch/functional.py", line 7, in <module>
    import torch.nn.functional as F
  File "/mnt/home/.local/lib/python3.8/site-packages/torch/nn/__init__.py", line 1, in <module>
    from .modules import *  # noqa: F403
  File "/mnt/home/.local/lib/python3.8/site-packages/torch/nn/modules/__init__.py", line 2, in <module>
    from .linear import Identity, Linear, Bilinear, LazyLinear
  File "/mnt/home/.local/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 7, in <module>
    from .. import functional as F
  File "/mnt/home/.local/lib/python3.8/site-packages/torch/nn/functional.py", line 18, in <module>
    from .._jit_internal import boolean_dispatch, _overload, BroadcastingList1, BroadcastingList2, BroadcastingList3
  File "/mnt/home/.local/lib/python3.8/site-packages/torch/_jit_internal.py", line 25, in <module>
    import torch.distributed.rpc
  File "/mnt/home/.local/lib/python3.8/site-packages/torch/distributed/__init__.py", line 55, in <module>
    from .distributed_c10d import *  # noqa: F403
  File "/mnt/home/.local/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 188, in <module>
    reduce_op = _reduce_op()
  File "/mnt/home/.local/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 176, in __init__
    for k, v in ReduceOp.__members__.items():
AttributeError: type object 'torch._C._distributed_c10d.ReduceOp' has no attribute '__members__'

any ideas?

ptrblck · September 14, 2022, 4:41pm

It’s working for me:

Status: Downloaded newer image for nvcr.io/nvidia/pytorch:22.08-py3

=============
== PyTorch ==
=============

NVIDIA Release 22.08 (build 42105213)
PyTorch Version 1.13.0a0+d321be6
...
root@99ec9b88acce:/workspace# python 
Python 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:10) 
[GCC 10.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__path__
['/opt/conda/lib/python3.8/site-packages/torch']
>>>

Could you check if you are mounting any folders which might overwrite oder files (and maybe the installation) inside the container? If not, did you install any packages, which might do the same?

Diego · September 14, 2022, 5:05pm

Silly me, I was exporting my $PATH variable to the container, making it believe that python was installed somewhere else, whoops!