I was trying to see how much structured sparsity can improve the huggingface Llama 2 model. I read this doc and it seems I can somehow use the tensorRT backend with sparse_weights
turned on. But it sounds like the acc_tracer
and jit in general doesn’t play very nicely with control flows.
If dynamo can output fx graphs easily is it possible to use dynamo first on the model and then simply use the TRTInterpreter
on the output fx graph for sparse weight acceleration? Any tips will be appreciated!