Pytorch DataLoader vs Tensorflow TFRecord

Hi,

I don’t have deep knowledge about Tensorflow and read about a utility called ‘TFRecord’. Is it the counterpart to ‘DataLoader’ in Pytorch ?

Best Regards

No, TfRecordis different thing compared to DataLoader.
Tf.data is counter part to DataLoader.
Both of them can read different format of data (numpy, text, path_to_images)

TfRecord is much more like DataBase which you can create before training and read from it during it. Main advantage is that you are not reading many small files but several bigger files (it should be faster). And TfRecord is special structure supported by TF.

In PyTorch you can use any known DataBase for reading the data. It up to you what you would choose.

4 Likes

The term ‘DataBase’ in context of Pytorch is ‘torch.utils.data.Dataset’ class instance … isn’t it ?

No, saying DataBase I mean SQL, LMDB database. Read here: What's the best way to load large data?

So DataBase as a external, general term.

torch.utils.data.Dataset can define how we want parse and transform data (ex. use LMDB and use DataAugmentation)

3 Likes

Totally clear now … Thanks a lot