DataLoader should Cache the same index computation?


I have a dataloader, which for each index, does some computation like mean features etc.

Doing this computation again and again is slowing down training, is there any way the dataloader doesn’t have to compute it?

The obvious solution is to do that computation outside the data loader, however, any other solution (like having a separate function than getitem is welcome.

As you said, you could compute these features offline once and store them as an attribute inside the Dataset.
If you want to compute them in the first epoch using your DataLoader, you might want to use a shared array and store the features there (as shown here), so that they will be reusable in the following epochs from each worker.

1 Like