How to create Pytorch dataset with filepath label format

I want to create a Pytorch dataset for RVL-CDIP to feed to a pytorch dataloader. This is the format

imagesq/q/o/c/qoc54c00/80035521.tif 15
where 15 is a label

is there a pytorch datasets function to create said dataset.

I would suggest to write a custom Dataset as described here. This would allow you to define the logic to load and process each sample manually. Based on your description I guess the path and label might be stored as a string or tuple etc. so splitting them into a list containing image paths and the corresponding target tensor should work.

1 Like