Custom Dataset labeling from CSV

Nikronic · September 9, 2019, 8:06pm

Hi,

You can define your own custom dataset class easily to handle this kind of situations.
Here is the top-level structure of the class your can implement:


class PlacesDataset(Dataset):
  def __init__():
     # initialize variables such is path to csv file and images and transforms
  def __len__():
    # here you just need to return a single integer number as the length of your dataset, in your 
    #  case, number of images in your train folder or lines in csv file
  def __getitem__(): 
    # this is the most important part, you need to define a code to read images from folder and
    # labels from csv files and return only a pair of (image, class). Note that here, you just 
    # need to consider 1 sample no more. Let say, you have only 1 image in your whole 
    # dataset, the method will work on batches parallely when you pass it to DataLoader class.

Now, you can do whatever you wanted to do with ImageFolder with this class too.
I know the explanation is too abstract, but this is the whole idea and if you need a real code which works, the link below is mine which uses a csv file to read images and generate labels on the go.

github.com

Nikronic/CoarseNet/blob/master/utils/preprocess.py

from __future__ import print_function, division
from PIL import Image
from torchvision.transforms import ToTensor, ToPILImage, Normalize, Compose
from torch.utils.data import DataLoader
import numpy as np
import random

import tarfile
import io
import os
import pandas as pd

from torch.utils.data import Dataset
import torch

from utils.Halftone.halftone import generate_halftone


class PlacesDataset(Dataset):
    def __init__(self, txt_path='filelist.txt', img_dir='data', transform=None, test=False):

This file has been truncated. show original

If you had any questions, feel free to ask.

Bests
Nik