Loading huge data functionality


(Yu Yu) #21

Hi, Do you solve the “unable to mmap memory: you tried to mmap 0GB” problem? How do you solve it? I have the same problem now.


#22

This way is good for images but isn’t fit for text. In NLP, data is usually in one file with multiple lines instead of one image in one file. So how to customize Dataset accordingly?


Custom Dataset error: unable to mmap memory: you tried to mmap 0GB
(Zizhuo Ren) #23

Meet the same problem here. Have you solved the problem yet?


(Will) #24

I have the same problem here. Have you solved it?


(Solomon K ) #25

This has been reported here:


(Vidyasagar Ranganaboina) #27

Hi NgPDat,

Thanks for your response. I am running into a situation where, I am trying to load data from many csv files that contain parts of data. Each csv file contain say for eg, 2000 rows and I have 4000 such files. I am trying to load data into batches by iterating through the csv files. I implemented load_csv() function returning contents of the data, and wrote a custom Dataset. But for each iteration, when I specify batch size = 50, it loads 50 files in memory and returns 502000300 ( files * rows in each file * no. columns ). What changes should I make here so that Dataloader only returns 50 rows from a file rather than data from 50 csv files ?