Possible Logical Reasons behind Wrapping Examples around Special Tokens (SOS | EOS) While Preparing the Training Dataset for LLM

I’m in the process to train a small language model. I’m stuck in the data preparation step where I have seen how It’s a basic practice to wrap the example with special tokens (SOS, EOS).

I’d appreciate the logic behind this practice - also how it would affect the model performance If we don’t wrap the data with special tokens.

I’m attaching a sample data just to showcase my understanding for ref which I wrapped with the SOS token & tried to generate X (input) & Y (output) tensors.

Looking forward to hearing from the community :slight_smile:
Cheers!