How to generate a dataloader with file paths in pytorch?

theProcrastinatr · March 31, 2022, 9:51am

Hi,
I have a image dataset which consists of a csv file and two folders containing images. The csv file also contains all correct paths to the images in two folders.

Now I’m trying to generate a DataLoader by creating a Dataset object with the file paths present in the csv file. But, the path being either str or pathlib.PosixPath type wouldn’t work with dataloader as it expects Tensors and similar data.

So, is there any way to make use of the paths present in the DataFrame? I ask this because the data is quite large and won’t fit if I try to load it all at once.

Matias_Vasquez · March 31, 2022, 10:39am

This example taken from here shows you how to use a csv file to create your own Dataset.

In the __getitem__() method you can define how you want to load the images. This means you can take the str defining where your image is and read it from there.

class FaceLandmarksDataset(Dataset):
    """Face Landmarks dataset."""

    def __init__(self, csv_file, root_dir, transform=None):
        """
        Args:
            csv_file (string): Path to the csv file with annotations.
            root_dir (string): Directory with all the images.
            transform (callable, optional): Optional transform to be applied
                on a sample.
        """
        self.landmarks_frame = pd.read_csv(csv_file)
        self.root_dir = root_dir
        self.transform = transform

    def __len__(self):
        return len(self.landmarks_frame)

    def __getitem__(self, idx):
        if torch.is_tensor(idx):
            idx = idx.tolist()

        img_name = os.path.join(self.root_dir,
                                self.landmarks_frame.iloc[idx, 0])
        image = io.imread(img_name)
        landmarks = self.landmarks_frame.iloc[idx, 1:]
        landmarks = np.array([landmarks])
        landmarks = landmarks.astype('float').reshape(-1, 2)
        sample = {'image': image, 'landmarks': landmarks}

        if self.transform:
            sample = self.transform(sample)

        return sample

theProcrastinatr · April 1, 2022, 5:25am

I too ended up doing just that. Thanks @Matias_Vasquez for confirming that its the way to go