No, I just created a dummy dataframe and saved it to Disc so that I could load it with pd.read_hdf.
You should add the read line to your __getitem__ method and store the file path in __init__.
I see,
then if i just store the file path in __init__ and call indexed h5 in __getitem__ like [0:10]
then computer will just assign [0:10] data memory not the whole h5 file.
am i right?
Hi, I have a similar scenario. I analyse genomic data and each sequence can be represented in a vector of about 50-1000 dimensions. Larger the better. So in gist;
I generate features using a different pipeline and gives a text file, vector per line.
In this case, do I have to save vectors to an h5 file somehow? Any help would be greatly appreciated.