I have many .npz dataset (images and labels), each is very large, so I can’t load all of them to a dataloader (only 32 GB memory)
since there are multiple npz files, and each have different length (maximum 5000)
what is the best implement way to dynamically load the data?
I only come up with the dumbest way, just store a list of .npz file name , and iterate through it, and then build the dataloader each iteration
e.g.
list = [file1, file2, file3, file4, …]
for file in list:
dataset = …
dataloader = …
training …
del dataset, dataloader
but that result in another problem is that the npz file is so big (roughly 1~3GB), so every time I load the npz file will take huge amount of time.
is there a best solution for this scenario?