I’m trying to use executorch to run a Llama 3.2 quantized 3B QAT+LoRA model in an iOS app. The demo app works with the .pte
file I generated for that model, but then when I went to try to add executorch into my app, it seems like a lot of the code that the demo app uses isn’t available through the executorch Swift package – the demo app uses a bunch of stuff in executorch/extension/llm
, and as far as I can tell, the package just exposes executorch/extension/module/
, executorch/extension/tensor/
, and executorch/runtime/*
(but I could be wrong, I’m not super-familiar with C++!).
Is there a way you’d recommend to use that code from executorch/extension/llm
in my app? Or is that outside of the scope of the executorch library?
So far I’ve tried:
- copying the files directly into my app (license-permitting, of course), but that seemed brittle since it wouldn’t update as those files update, plus I ran into issues with the CMake build script
- getting away with only using the code in module.h like the SwiftPM example (https://pytorch.org/executorch/main/_static/img/swiftpm_xcode.mp4) does, but for my use case it seems like I’d need most of the text-specific stuff (tokenizer, text decoder, text prefiller, text token generator) so that didn’t work
Thanks!