In order to understand the LLM’s (concepts in deeper along with coding) I thought to code a small LLM from scratch.
I have completed the coding part but I have some conceptual errors( I guess) due to which I was not able to achieve my desired task.
The Model is attention based transformer model. It aim to create a low level small scale LLM (due to resource constraints I am happy with even a very basic LLM as I am intended to learn the concepts along with its code).
The Model should be able to do atleast any of the following tasks,
- text completion
- text generation (I.e. Generating the output which is much more relational to input)
If everything goes fine then I am thinking about fine tuning it with the normal chat dataset. So I can get a chatbot of my own from scratch.
Although I have coded the model with the help of GPT and some online resources but I think that I have missed something important (Conceptual Error) due to which I was not able to achieve the results expected.
My Model - llmwithtransformers