reuse dataloader

My training code is like the following:

for epoch in range(n_epoch):
for i, batch in enumerate(dataset):
train()

dataset is created using Dataloader and I have only one data file. I saw that Dataloader loads everything in this file into memory and then extracts batches. I want to train this data file for multiple epochs. But I noticed that every time a new epoch begins, the data file will be loaded from disk again and loading this file takes a lot of time. How to load this file only once for the first epoch and reuse the dataset without fetching from disk repeatedly for the following epochs? Thanks a lot!

I assume you are loading the data in the Dataset.__init__ method?
If that’s the case, you could preload the data before creating the Dataset and pass it to its __init__ method:

data = torch.load(...)
dataset = MyDataset(data)
loader = DataLoader(dataset)

This should avoid reloading the data. However, the Dataset.__init__ method would still be called in each epoch, if I’m not mistaken, so you should make sure that no heavy loading is there.

Hi @ptrblck, thank you for your reply! The training uses the same data file and trains it for many epochs. How can I avoid reloading the file on every new epoch? How to reuse the data loaded in the previous epoch? Thanks!

Is my suggestion not working for you?
If not, what would be the reason?