Hi, I’d like to create a smaller llama model using LLaMA2-7b-hf model. In particular, I’d like to reduce the number layers and add skip connections in between certain layers. Can anyone help me to do this properly?
Do you have a dataset on which the output should be similar?
Then, you can train your model to match the original one’s output and (likely also) intermediate results.
Best regards
Thomas
Sorry if the original question isn’t clear. I don’t want similar outputs from the smaller model. I just want to create a smaller model that has the same structure as llama2-7b except that it has fewer layers and additional skip connections. Other than that the two models should be the same (e.g. embedding layer, lm_head, embedding dimension, tokenizer etc.)
The easiest would likely be to take a very simple implementation and modify that to your needs.
The ones I’m most aware of are Lit-GPT (disclaimer: I contributed a patch or two there) and A. Karpathy’s llama2.c (which has a Python model for training and a training example on smaller configurations).
Best regards
Thomas