Where are input ids and attention masks stored during extra large scale finetuning?

Prudhvi_Raj · February 9, 2023, 2:44am

Hi, I am working on a project that involves fine-tuning a sentence transformers model on 1B pairs. Usually in practice, how are these pairs processed? It seems futile to load the sentences from a generate and generate the tokenized encodings for each of them in every epoch? Is there a practical way to do this? Please help. Thanks