Sayli_D
(Sayli Dighde)
November 22, 2023, 5:17pm
1
{
"cells": [
{
"cell_type": "markdown",
"id": "0524d4b4",
"metadata": {},
"source": [
"# Using TorchServe on SageMaker Inf2.24xlarge with Llama2-13B"
]
},
{
"cell_type": "markdown",
"id": "92a529f6",
"metadata": {},
"source": [
"## Contents\n",
"\n",
"This notebook uses SageMaker notebook instance conda_python3 kernel, demonstrates how to use TorchServe to deploy Llama-2-13 on SageMaker inf2.24xlarge. There are multiple advanced features in this example.\n",
"\n",
"* Neuronx AOT precompile model\n",
This file has been truncated. show original
Could create an endpoint as above for llama 13b base, but it gives a timeout error on container primary for 13b chat.
For above, created the neuron artifacts for the 13b chat model using this -
cd serve
# Install dependencies
python ts_scripts/install_dependencies.py --neuronx --environment=dev
# Install torchserve and torch-model-archiver
python ts_scripts/install_from_src.py
# Install additional neuron packages, SDK 2.12.2: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/content.html#id8
python -m pip install neuronx-cc==2.8.0.25 torch-neuronx==1.13.1.1.9.1 transformers-neuronx==0.5.58
# Navigate to `examples/large_models/inferentia2/llama2` directory
cd examples/large_models/inferentia2/llama2/
# Install additional necessary packages
python -m pip install -r requirements.txt
```
### Step 3: Save the model artifacts compatible with `transformers-neuronx`
In order to use the pre-compiled model artifacts, copy them from the model zoo using the command shown below and skip to **Step 5**
```bash
Could start torchserve and run inference via curl command here, so the model artifacts look okay. But the same artifacts won’t work in the first notebook reference link.