Hi,
I have a image dataset which consists of a csv
file and two folders containing images. The csv
file also contains all correct paths to the images in two folders.
Now I’m trying to generate a DataLoader
by creating a Dataset
object with the file paths present in the csv file. But, the path being either str
or pathlib.PosixPath
type wouldn’t work with dataloader as it expects Tensors
and similar data.
So, is there any way to make use of the paths present in the DataFrame
? I ask this because the data is quite large and won’t fit if I try to load it all at once.
This example taken from here shows you how to use a csv
file to create your own Dataset
.
In the __getitem__()
method you can define how you want to load the images. This means you can take the str
defining where your image is and read it from there.
class FaceLandmarksDataset(Dataset):
"""Face Landmarks dataset."""
def __init__(self, csv_file, root_dir, transform=None):
"""
Args:
csv_file (string): Path to the csv file with annotations.
root_dir (string): Directory with all the images.
transform (callable, optional): Optional transform to be applied
on a sample.
"""
self.landmarks_frame = pd.read_csv(csv_file)
self.root_dir = root_dir
self.transform = transform
def __len__(self):
return len(self.landmarks_frame)
def __getitem__(self, idx):
if torch.is_tensor(idx):
idx = idx.tolist()
img_name = os.path.join(self.root_dir,
self.landmarks_frame.iloc[idx, 0])
image = io.imread(img_name)
landmarks = self.landmarks_frame.iloc[idx, 1:]
landmarks = np.array([landmarks])
landmarks = landmarks.astype('float').reshape(-1, 2)
sample = {'image': image, 'landmarks': landmarks}
if self.transform:
sample = self.transform(sample)
return sample
1 Like
I too ended up doing just that. Thanks @Matias_Vasquez for confirming that its the way to go 
1 Like