Dataloader for a folder with multiple files. PyTorch solutions that is equivalent to TFRecordDataset in TF2.0

Hey Yin,

I think there is no ‘easy’ way to do it (i.e. a function which does it efficiently for you in pytorch). I had a similar problem and how I did it is I simply had another job which converted tfrecords to .avro, saved it, and then I read avro to the dataloader. I did it cause it was the easiest way to do and my data is usually in avro/parquet. Probably the easiest way is to built a custom function (in place of custom_fn in the AvroDataReader object in the snippet above), which converts tfrecord to something like tf.float32 and then tf.float32 (but don’t know whether you will need a tf session for that).

This thread looks very relevant.