Pytorch DataLoader vs Tensorflow TFRecord

Ahmed_m · May 9, 2018, 11:52am

Hi,

I don’t have deep knowledge about Tensorflow and read about a utility called ‘TFRecord’. Is it the counterpart to ‘DataLoader’ in Pytorch ?

Best Regards

melgor · May 9, 2018, 2:04pm

No, TfRecordis different thing compared to DataLoader.
Tf.data is counter part to DataLoader.
Both of them can read different format of data (numpy, text, path_to_images)

TfRecord is much more like DataBase which you can create before training and read from it during it. Main advantage is that you are not reading many small files but several bigger files (it should be faster). And TfRecord is special structure supported by TF.

In PyTorch you can use any known DataBase for reading the data. It up to you what you would choose.

Ahmed_m · May 9, 2018, 2:10pm

The term ‘DataBase’ in context of Pytorch is ‘torch.utils.data.Dataset’ class instance … isn’t it ?

melgor · May 9, 2018, 3:41pm

No, saying DataBase I mean SQL, LMDB database. Read here: What's the best way to load large data?

So DataBase as a external, general term.

torch.utils.data.Dataset can define how we want parse and transform data (ex. use LMDB and use DataAugmentation)

Ahmed_m · May 9, 2018, 3:42pm

Totally clear now … Thanks a lot