I noticed that when building pytorch v2 it wasn’t using TRT because it couldn’t find it. Fixed that. But errors trying to compile it. Seems pyt has ‘third_party/onnx-tensorrt’ under it and it is so old it only supports a very ancient TRT v7. Replaced it with the newest pull of onnx-tensorrt which supports the very recent TRT v8.5.3. This allows the build to succeed. Although I have to hard code TENSORRT_LIBRARY_INFER_PLUGIN in a make file.
With some other work I won’t detail here I got AUTOMATIC1111 Stable Diffusion running with PyTorch with TensorRT enabled.
Did the sd_model = torch.compile(sd_model) thing and added a few “@torch.compile” to the code. When I try to generate an image I get:
‘sm_89’ is not a recognized processor for this target (ignoring processor)
I can’t find source code for this to determinate what is happening. sm_89 is my Ada ARCH based 4090. Is this being passed as a CPU processor type for some reason. I have CUDA 11.8 which support sm_89. sm_89 was detected by the Pytorch build so it should support it. TRT v8.5.3 just came out so it should support it.
The ~32 messages I get all starting with ‘sm_89’ have no stack traces. They come from some 32 worker processes probably because I have 32 cpu’s. Given that it says ‘ignoring processor’ it is unlikely a hard failure thus no stack. It seems to be confusing the GPU type with a CPU but I don’t know where this is coming from.
Then after that I do get failures when it tries to execute certain functions I’ve marked with “@torch.compile”. tqdm/utils.py:77
return self._comparable == other._comparable
AttributeError: ‘function’ object has no attribute ‘_comparable’
I printed the type of ‘self’ and it is <class ‘tqdm.asyncio.tqdm_asyncio’>
If I fix this by modifying the eq routine to see if the object has the attribute first then I get:
File “/home/dwood/TRT/a1111/venv/lib/python3.10/site-packages/torch/_dynamo/variables/nn_module.py”, line 124, in var_getattr
subobj = inspect.getattr_static(base, name)
File “/usr/lib/python3.10/inspect.py”, line 1774, in getattr_static
raise AttributeError(attr)
AttributeError: cond_stage_key
from user code:
File “/home/dwood/TRT/a1111/modules/sd_samplers_kdiffusion.py”, line 87, in forward
is_edit_model = shared.sd_model.cond_stage_key == “edit” and self.image_cfg_scale is not None and self.image_cfg_scale != 1.0
NOTE that sd_model is after a torch.compile. The SD code expects that it still has cond_stage_key in it even if it was compiled. If I compile the model without the @torch.compile then it runs with no error.
The ‘sm_89’ is not a recognized processor for this target (ignoring processor) message seems to be raised by LLVM from here, which is used by OpenAI/Triton.
I don’t know why your GPU architecture is passed as a CPU architecture to it.
The second issue:
“@torch.compile”. tqdm/utils.py:77
return self._comparable == other._comparable
AttributeError: ‘function’ object has no attribute ‘_comparable’
seems to fail in torch.compile when your Python script uses tqdm, so you might need to remove it (you should see its usage in a for loop).
Thanks.
First issue. That helps by narrowing the focus of what I’ll debug if this ever comes back. It mysteriously disappear just when I tried to attack a debugger and has never come back.
Second issue:
To get past it I added the following to tqdm eq(self, other)
if not hasattr(other, ‘_comparable’):
return False
Hmmm, just after I typed the above I found that in sep 2 2022 a more complete version of this was done by the tqdm folks. Yet another package I need to get the latest of.
But it does reappear occasionally but isn’t a hard failure. I just wonder if I’m not getting all I can in terms of optimal performance because of this resulting in compiling for a generic cpu when I actually have a RaptorLake. Perhaps I’ll run with a library interposer and when: ‘sm_89’ is sent in a write() to STDOUT I can hang it so I can attach a debugger and work backwards from there.
I mentioned you on the other issue as I figured out what was going wrong with the pytorch attribute lookup mechanism which apparently does NOT work for members of the original model class once compiled. There are a number of issues I had to debug and fix or hack to get ‘@torch.compille’ to actually work. I wonder if the PyTorch folks need engineering help to reach GA?