Torch v2.0 and TensorRT v8

I noticed that when building pytorch v2 it wasn’t using TRT because it couldn’t find it. Fixed that. But errors trying to compile it. Seems pyt has ‘third_party/onnx-tensorrt’ under it and it is so old it only supports a very ancient TRT v7. Replaced it with the newest pull of onnx-tensorrt which supports the very recent TRT v8.5.3. This allows the build to succeed. Although I have to hard code TENSORRT_LIBRARY_INFER_PLUGIN in a make file.

With some other work I won’t detail here I got AUTOMATIC1111 Stable Diffusion running with PyTorch with TensorRT enabled.
Did the sd_model = torch.compile(sd_model) thing and added a few “@torch.compile” to the code. When I try to generate an image I get:
‘sm_89’ is not a recognized processor for this target (ignoring processor)
I can’t find source code for this to determinate what is happening. sm_89 is my Ada ARCH based 4090. Is this being passed as a CPU processor type for some reason. I have CUDA 11.8 which support sm_89. sm_89 was detected by the Pytorch build so it should support it. TRT v8.5.3 just came out so it should support it.

Any ideas.

Could you post the stacktrace to see which part of the build is raising the error, please?

The ~32 messages I get all starting with ‘sm_89’ have no stack traces. They come from some 32 worker processes probably because I have 32 cpu’s. Given that it says ‘ignoring processor’ it is unlikely a hard failure thus no stack. It seems to be confusing the GPU type with a CPU but I don’t know where this is coming from.

Then after that I do get failures when it tries to execute certain functions I’ve marked with “@torch.compile”. tqdm/utils.py:77
return self._comparable == other._comparable
AttributeError: ‘function’ object has no attribute ‘_comparable’
I printed the type of ‘self’ and it is <class ‘tqdm.asyncio.tqdm_asyncio’>

If I fix this by modifying the eq routine to see if the object has the attribute first then I get:
File “/home/dwood/TRT/a1111/venv/lib/python3.10/site-packages/torch/_dynamo/variables/nn_module.py”, line 124, in var_getattr
subobj = inspect.getattr_static(base, name)
File “/usr/lib/python3.10/inspect.py”, line 1774, in getattr_static
raise AttributeError(attr)
AttributeError: cond_stage_key

from user code:
File “/home/dwood/TRT/a1111/modules/sd_samplers_kdiffusion.py”, line 87, in forward
is_edit_model = shared.sd_model.cond_stage_key == “edit” and self.image_cfg_scale is not None and self.image_cfg_scale != 1.0

NOTE that sd_model is after a torch.compile. The SD code expects that it still has cond_stage_key in it even if it was compiled. If I compile the model without the @torch.compile then it runs with no error.

The ‘sm_89’ is not a recognized processor for this target (ignoring processor) message seems to be raised by LLVM from here, which is used by OpenAI/Triton.
I don’t know why your GPU architecture is passed as a CPU architecture to it.

The second issue:

“@torch.compile”. tqdm/utils.py:77
return self._comparable == other._comparable
AttributeError: ‘function’ object has no attribute ‘_comparable’

seems to fail in torch.compile when your Python script uses tqdm, so you might need to remove it (you should see its usage in a for loop).

Thanks.
First issue. That helps by narrowing the focus of what I’ll debug if this ever comes back. It mysteriously disappear just when I tried to attack a debugger and has never come back.
Second issue:
To get past it I added the following to tqdm eq(self, other)
if not hasattr(other, ‘_comparable’):
return False
Hmmm, just after I typed the above I found that in sep 2 2022 a more complete version of this was done by the tqdm folks. Yet another package I need to get the latest of.

OK, let me know how it goes as it seems some issues don’t show up anymore. :slight_smile:

But it does reappear occasionally but isn’t a hard failure. I just wonder if I’m not getting all I can in terms of optimal performance because of this resulting in compiling for a generic cpu when I actually have a RaptorLake. Perhaps I’ll run with a library interposer and when: ‘sm_89’ is sent in a write() to STDOUT I can hang it so I can attach a debugger and work backwards from there.

I mentioned you on the other issue as I figured out what was going wrong with the pytorch attribute lookup mechanism which apparently does NOT work for members of the original model class once compiled. There are a number of issues I had to debug and fix or hack to get ‘@torch.compille’ to actually work. I wonder if the PyTorch folks need engineering help to reach GA?

I’m sure the torch.compile devs would be happy about any support and you can surely contribute to PyTorch to improve it.