Creating smaller model from original LLaMA models

Kason · August 28, 2023, 9:43pm

Hi, I’d like to create a smaller llama model using LLaMA2-7b-hf model. In particular, I’d like to reduce the number layers and add skip connections in between certain layers. Can anyone help me to do this properly?

tom · August 29, 2023, 6:16am

Do you have a dataset on which the output should be similar?
Then, you can train your model to match the original one’s output and (likely also) intermediate results.

Best regards

Thomas

Kason · August 29, 2023, 1:31pm

Sorry if the original question isn’t clear. I don’t want similar outputs from the smaller model. I just want to create a smaller model that has the same structure as llama2-7b except that it has fewer layers and additional skip connections. Other than that the two models should be the same (e.g. embedding layer, lm_head, embedding dimension, tokenizer etc.)

tom · August 29, 2023, 2:11pm

The easiest would likely be to take a very simple implementation and modify that to your needs.
The ones I’m most aware of are Lit-GPT (disclaimer: I contributed a patch or two there) and A. Karpathy’s llama2.c (which has a Python model for training and a training example on smaller configurations).

Best regards

Thomas