The recently released Torch 2.0 is a great work, so I wanted to give it a try
import torch
import torchvision
m = torchvision.models.resnet50().cuda()
mm = torch.compile(m)
data = torch.rand(1, 3, 224, 224).cuda()
o = mm(data)
but error was occur:
/usr/local/anaconda3/lib/python3.7/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: libtorch_cuda_cu.so: cannot open shared object file: No such file or directory
warn(f"Failed to load image Python extension: {e}")
/usr/local/anaconda3/lib/python3.7/site-packages/torch/_dynamo/eval_frame.py:367: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')`
"TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled."
Traceback (most recent call last):
File "test.py", line 9, in <module>
o = mm(data)
File "/usr/local/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/anaconda3/lib/python3.7/site-packages/torch/_dynamo/eval_frame.py", line 82, in forward
return self.dynamo_ctx(self._orig_mod.forward)(*args, **kwargs)
File "/usr/local/anaconda3/lib/python3.7/site-packages/torch/_dynamo/eval_frame.py", line 211, in _fn
return fn(*args, **kwargs)
File "/usr/local/anaconda3/lib/python3.7/site-packages/torchvision/models/resnet.py", line 284, in forward
def forward(self, x: Tensor) -> Tensor:
File "/usr/local/anaconda3/lib/python3.7/site-packages/torch/_dynamo/eval_frame.py", line 211, in _fn
return fn(*args, **kwargs)
File "/usr/local/anaconda3/lib/python3.7/site-packages/torch/_functorch/aot_autograd.py", line 2343, in forward
return compiled_fn(full_args)
File "/usr/local/anaconda3/lib/python3.7/site-packages/torch/_functorch/aot_autograd.py", line 887, in g
return f(*args)
File "/usr/local/anaconda3/lib/python3.7/site-packages/torch/_functorch/aot_autograd.py", line 1906, in debug_compiled_function
return compiled_function(*args)
File "/usr/local/anaconda3/lib/python3.7/site-packages/torch/_functorch/aot_autograd.py", line 1718, in compiled_function
all_outs = CompiledFunction.apply(*args_with_synthetic_bases)
File "/usr/local/anaconda3/lib/python3.7/site-packages/torch/autograd/function.py", line 419, in apply
return super().apply(*args, **kwargs)
File "/usr/local/anaconda3/lib/python3.7/site-packages/torch/_functorch/aot_autograd.py", line 1584, in forward
disable_amp=disable_amp,
File "/usr/local/anaconda3/lib/python3.7/site-packages/torch/_functorch/aot_autograd.py", line 912, in call_func_with_args
out = normalize_as_list(f(args))
File "/usr/local/anaconda3/lib/python3.7/site-packages/torch/_inductor/compile_fx.py", line 199, in run
return model(new_inputs)
File "/tmp/torchinductor_root/4r/c4rjyj24xhy4dcta2noqlpecsq3ewxmrtw543abb7vzx47ah2yj5.py", line 2313, in call
triton_fused_convolution_mean_var_var_1_2.run(buf0, buf2, buf3, buf5, buf7, 128, 6272, grid=grid(128), stream=stream0)
File "/usr/local/anaconda3/lib/python3.7/site-packages/torch/_inductor/triton_ops/autotune.py", line 180, in run
stream=stream,
File "<string>", line 6, in launcher
RuntimeError: Triton Error [CUDA]: invalid argument
*** Error in `python': munmap_chunk(): invalid pointer: 0x00007f72b10c03e9 ***
my environment is:
cuda: 11.7
python: 3.9
gpu: a100
cuda driver version: 470.57.02
I set the environment follow the document: https://pytorch.org/get-started/pytorch-2.0/#requirements