Where are input ids and attention masks stored during extra large scale finetuning?

Hi, I am working on a project that involves fine-tuning a sentence transformers model on 1B pairs. Usually in practice, how are these pairs processed? It seems futile to load the sentences from a generate and generate the tokenized encodings for each of them in every epoch? Is there a practical way to do this? Please help. Thanks