Hello, I have around 400 GB of image data which I have stored in n .nc files (Working with xarrays). I instantiate n Dataset objects, one for each file and use ConcatDataset. I do not want to load them all on memory together. Therefore, I lazy load the dataset onto memory using the logic below inside __getitem__
:
if not self.isDataLoaded:
self.data = xr.load_dataset(self.data_file, engine="h5netcdf")
self.isDataLoaded= True
With shuffle=False
, my expectation was that only one file will be loaded and iterated before going to the next one. However, all the files are loaded onto memory in the beginning of the epoch leading to OOM issues. I would understand if the memory usage increased gradually during the epoch because I still have not figured out how to identify end of iteration for that single dataset and delete the loaded memory. Could anyone spot an issue here? Thanks!