Reading webdataset with TorchData

Hi, I’ve been unable to figure out how to read webdataset tars with TorchData. I have images and annotations generated in webdataset format with jpg and cls keys. How can I pair up the jpg and cls items in the tar that are located next to each other?

It depends on how the files are stored, but here are the DataPipes that will be helpful:

  1. FileLister lists out all your files in your directories in order.
  2. TarArchiveLoader (.load_from_tar) loads/decompresses your TAR archives.
  3. Grouper (.groupby) can group files of the same names together, after that you can open the files, transform them, and return a sample at a time as you see fit.