Read dataset from TFRecord format

ptrblck · October 3, 2018, 5:07am

So tfrecord splits your dataset into several chunks and stores these files in a binary format.
E.g. for 10000 images you could have 10 tfrecord files each containing 1000 images?

I’m not sure about benefit 1 and 2.
The io might be limited, if you load a lot of files, but on the other side you would have to load a huge file before even the first iteration can start. The same goes for point 2. While the GPU is busy your multiple workers can load a new batch of images instead of a large file.

Anyway, you could probably emulate such a behavior by loading some images, store them into a tensor and save it to your file system. Then you would have to create a logic to load a new chunk based on your current index, and finally get your sample.
Let me know, if I misunderstood your use case or if you need some help figuring out the chunk loading.