Batch predict big dataset

Hello all, I am pretty new to PyTorch, so I hope this is not too dumb a question, but I am running into a problem with prediction on a dataset using my trained PyTorch model. I have a trained model for sentiment analysis, that all went well. But I need to apply it to a dataset of about 1.1 million unlabeled texts. I have tried doing so a few times, but I tend to end up with too high RAM usage, causing either my Colab session or my own computer to crash.
Someone suggested doing the prediction in batches, but I don’t understand how I would do so. Could someone please point me in the right direction for how I am supposed to predict new labels in this use case? Any help would be greatly appreciated!

I’m not sure if your use case has specific requirements, but the common approach would be to use a Dataset, wrap it in a DataLoader and process each batch using the model.
The dataloading tutorial might be a good starter.

Thank you for the suggestion! I am using the method suggested here for training the model: ML-and-Data-Analysis/RoBERTa for text classification.ipynb at master · aramakus/ML-and-Data-Analysis · GitHub
It saves the model as two files, a model.pkl and a metrics.pkl file.
Basically, I am trying to predict new labels using this model. Is the dataloader then the right approach?

In case you would like to create the predictions for a dataset, using a DataLoader sounds like a good approach. On the other hand, if you would like to get a prediction of a single sample, you could directly pass it to the model, so the “best” approach depends a bit on your use case.