Use dynamo and tensorRT backend for structured sparsity

I was trying to see how much structured sparsity can improve the huggingface Llama 2 model. I read this doc and it seems I can somehow use the tensorRT backend with sparse_weights turned on. But it sounds like the acc_tracer and jit in general doesn’t play very nicely with control flows.

If dynamo can output fx graphs easily is it possible to use dynamo first on the model and then simply use the TRTInterpreter on the output fx graph for sparse weight acceleration? Any tips will be appreciated!

So you should be able to use this with partial graphs

I see, thanks for the help! I checked the py/torch_tensorrt/dynamo/ and it doesn’t seem to have options to explicitly enable/disable sparsity. Is structured sparsity used behind the scene somehow? I mainly wants to test the performance difference with & without sparsity acceleration enabled.