How to create a custom dataset when the order and the total number of training samples is not known in advance?

I have a 42 GB jsonl file. Every element of this file is a json object. I create training samples from every json object. But the number of training samples from every json object can vary between 0-5. What is the best way to create a custom dataset without reading the entire jsonl file in memory?