Transformer Model for Language Modelling in NLP

In order to understand the LLM’s (concepts in deeper along with coding) I thought to code a small LLM from scratch.
I have completed the coding part but I have some conceptual errors( I guess) due to which I was not able to achieve my desired task.

Purpose of Model

The Model is attention based transformer model. It aim to create a low level small scale LLM (due to resource constraints I am happy with even a very basic LLM as I am intended to learn the concepts along with its code).

Expected Working

The Model should be able to do atleast any of the following tasks,

  1. text completion
  2. text generation (I.e. Generating the output which is much more relational to input)

If everything goes fine then I am thinking about fine tuning it with the normal chat dataset. So I can get a chatbot of my own from scratch.

Problem

Although I have coded the model with the help of GPT and some online resources but I think that I have missed something important (Conceptual Error) due to which I was not able to achieve the results expected.

Model

My Model - llmwithtransformers