Pytorch2.0 error on T4

I installed pytorch 2.0 on an ec2 instance with T4 gpu. However, this code fails

import torch
import torch.nn as nn
import torchvision.models as models

model = models.resnet50().cuda()
model = torch.compile(model)

x = torch.randn(1, 3, 224, 224).cuda()
out = model(x)
print(out.shape)

It says that torch._dynamo.exc.BackendCompilerFailed: debug_wrapper raised ImportError: cannot import name 'next_power_of_2' from 'triton' (unknown location)

Also, running the test via python pytorch/tools/dynamo/verify_dynamo.py gives module 'triton' has no attribute 'jit.

System Info

Collecting environment information...
PyTorch version: 2.0.0+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: version 3.26.1
Libc version: glibc-2.31

Python version: 3.8.10 (default, Mar 13 2023, 10:26:41)  [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.15.0-1033-aws-x86_64-with-glibc2.29
Is CUDA available: True
CUDA runtime version: 11.7.64
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: Tesla T4
Nvidia driver version: 515.43.04
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
...
Versions of relevant libraries:
[pip3] numpy==1.24.2
[pip3] pytorch-lightning==2.0.1
[pip3] torch==2.0.0
[pip3] torchaudio==2.0.1
[pip3] torchmetrics==0.11.4
[pip3] torchvision==0.15.1
[pip3] triton==2.0.0

Could you post the install command you have used on this EC2 instance?
Was it just pip install torch?
I cannot reproduce the issue locally using 2.0.0+cu117 and also the verification works:

python tools/dynamo/verify_dynamo.py 
Python version: 3.8.16
`torch` version: 2.0.0+cu117
CUDA version: 11.7
ROCM version: None

All required checks passed

yes, this was the command
pip install torch torchvision torchaudio

I recreated a new instance and used cuda 11.8 and that worked perfectly. Not sure why 11.7 wasn’t working.

1 Like
  File "/home/LAB/geling/.conda/envs/AAAI/lib/python3.9/site-packages/torch/_dynamo/output_graph.py", line 588, in compile_and_call_fx_graph
    compiled_fn = self.call_user_compiler(gm)
  File "/home/LAB/geling/.conda/envs/AAAI/lib/python3.9/site-packages/torch/_dynamo/utils.py", line 163, in time_wrapper
    r = func(*args, **kwargs)
  File "/home/LAB/geling/.conda/envs/AAAI/lib/python3.9/site-packages/torch/_dynamo/output_graph.py", line 675, in call_user_compiler
    raise BackendCompilerFailed(self.compiler_fn, e) from e
torch._dynamo.exc.BackendCompilerFailed: compile_fn raised ImportError: cannot import name 'next_power_of_2' from 'triton' (unknown location)
XXX@dell-gpu-32:~/MSDA$ python -c "import torch; print(torch.__version__)"                                                                                                            
2.0.1+cu118
XXX@dell-gpu-32:~/MSDA$ python -c "import torch; print(torch.version.cuda)"
11.8

I have the same issue

  File "/home/LAB/geling/.conda/envs/AAAI/lib/python3.9/site-packages/torch/_dynamo/convert_frame.py", line 311, in transform
    tracer.run()
  File "/home/LAB/geling/.conda/envs/AAAI/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 1726, in run
    super().run()
  File "/home/LAB/geling/.conda/envs/AAAI/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 576, in run
    and self.step()
  File "/home/LAB/geling/.conda/envs/AAAI/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 540, in step
    getattr(self, inst.opname)(inst)
  File "/home/LAB/geling/.conda/envs/AAAI/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 1792, in RETURN_VALUE
    self.output.compile_subgraph(
  File "/home/LAB/geling/.conda/envs/AAAI/lib/python3.9/site-packages/torch/_dynamo/output_graph.py", line 541, in compile_subgraph
    self.compile_and_call_fx_graph(tx, pass2.graph_output_vars(), root)
  File "/home/LAB/geling/.conda/envs/AAAI/lib/python3.9/site-packages/torch/_dynamo/output_graph.py", line 588, in compile_and_call_fx_graph
    compiled_fn = self.call_user_compiler(gm)
  File "/home/LAB/geling/.conda/envs/AAAI/lib/python3.9/site-packages/torch/_dynamo/utils.py", line 163, in time_wrapper
    r = func(*args, **kwargs)
  File "/home/LAB/geling/.conda/envs/AAAI/lib/python3.9/site-packages/torch/_dynamo/output_graph.py", line 675, in call_user_compiler
    raise BackendCompilerFailed(self.compiler_fn, e) from e
torch._dynamo.exc.BackendCompilerFailed: compile_fn raised ImportError: cannot import name 'next_power_of_2' from 'triton' (unknown location)

Set torch._dynamo.config.verbose=True for more information

I reproduce the issue

I still cannot reproduce the issue on a T4 using torch==2.0.1+cu118 and torchvision==0.15.2+cu118 using the initially posted code and get torch.Size([1, 1000]) as the output without any error messages.

I got the same issue after playing with nightly release and then downgrading to 2.0.1

Solved the issue by removing pytorch, triton, etc before the 2.0.1 installation

rm -rf ~/.local/lib/python3.10
sudo rm -rf /usr/local/lib/python3.10/dist-packages/tri*
sudo rm -rf /usr/local/lib/python3.10/dist-packages/tor*
sudo rm -rf /usr/local/lib/python3.10/dist-packages/pyto*
sudo rm -rf /usr/local/lib/python3.10/dist-packages/tran*
sudo rm -rf /usr/local/lib/python3.10/dist-packages/onnx*

Install pytorch 2.0.1 cu118 now

sudo pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
sudo pip3 install transformers
1 Like