I am trying to train GPT2 on a custom dataset, the training text file is over 100GB, when creating the Dataset for training, it is taking too long and I was hoping if there is any way to save the dataset for later use.
The dataset is torch dataset.
I am using hugging face blog post on how to train transformers from scratch as a guide (https://huggingface.co/blog/how-to-train).
Thank You