Usage of Dataloader(image with ground truth)

I am doing some projects with the dataset from

These datasets are image files named with specific rules described on the link above.
From the filename, I can extract the angle information of the image.

By the way, the problem is made when I load the file.

I loaded the images with data loader from torchvision

Related codes are like below(referenced at the tutorial of the PyTorch).

import os
import torch
import pandas as pd
from skimage import io, transform
import numpy as np
import matplotlib.pyplot as plt
from import Dataset, DataLoader
import torchvision
import torchvision.transforms as transforms
import string
import math

def imshow(img):
    img = img / 2 + 0.5
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))

path = 'C:\\Users\\~~~~~~'

transform = transforms.Compose(
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

train_loader =
    torchvision.datasets.ImageFolder(path, transform=transform),

dataiter = iter(train_loader)
images =

# show images
# print(images[0])

I loaded the images as batches( batch size is 4). But, I couldn’t load the filenames of the images.

Questions are simple.

  1. Can I load the filenames with images at the same time?

  2. I made the angle data separately. Can I load the data with image by the elements?( not in order, because I want to train the network with shuffled )

  3. Any other special solutions for these datasets to be trained?

If the target is encoded in the file name, I would recommend to write a custom Dataset.
Something like this should work:

class MyDataset(Dataset):
    def __init__(self, image_paths, transform=None):
        self.image_paths = image_paths
        self.transform = transform
    def get_target_from_path(self, path):
        # Implement your target extraction here
        return torch.tensor([0])
    def __getitem__(self, index):
        x =[index])
        y = self.get_target_from_path(self.image_paths[index])
        if self.transform:
            x = self.transform(x)
        return x, y
    def __len__(self):
        return len(self.image_paths)

If you would also like to return the image names, just them to the return statement in __getitem__.


In addition to what @ptrblck said, you will probably need a custom collate_fn because tensors can’t contain strings.

That’s an interesting point, and I thought the same. However, returning a random string seems to work for me:

class MyDataset(Dataset):
    def __init__(self, image_paths):
        self.image_paths = image_paths    
    def __getitem__(self, index):
        x = torch.randn(3, 24, 24)
        y = torch.randint(0, 10, (1,))
        return x, y, 'lala'
    def __len__(self):
        return len(self.image_paths)

dataset = MyDataset(['']*100)
loader = DataLoader(dataset, batch_size=10)
x, y, s = next(iter(loader))
> ('lala', 'lala', 'lala', 'lala', 'lala', 'lala', 'lala', 'lala', 'lala', 'lala')
1 Like

That’s strange. I’m pretty sure, this wasn’t possible in past releases and I don’t know what should have changed here… Maybe it was such a common issue, that the default collate_fn was changed?

It works. Thank you @ptrblck

class MyDataset(Dataset):
   def __init__(self, image_paths, dir, transform = None):
       self.image_paths = image_paths
       self.tranform = transform
       self.dir = dir

   def get_target_from_path(self,path):
       function that will return the target from the path
       # find the address of the rear sign
       rear = path.rfind('+')
       rear_ = path.rfind('-')
       if rear < rear_:
           rear = rear_

       # make the sign in to number sign that was just a string sign
       pm1 = pm2 = 1
       if path[13] == '-': pm1 = -1
       if path[rear] == '-': pm2 = -1

       v_ang = pm1 * int(path[14:rear])
       h_ang = pm2 * int(path[rear+1:len(path)-4])

       return v_ang, h_ang

   def __getitem__(self,index):
       # open the image file and get angle data from the filename
       image =,self.image_paths[index]))
       v_ang, h_ang = self.get_target_from_path(self.image_paths[index])

       # transform the image
       if self.tranform:
           image = self.tranform(image)
       return image, v_ang, h_ang

   def __len__(self):
       #return the total number of dataset
       return len(self.image_paths)

I made the code like above.

hi, I got advice as dynamically loading images, which is what this function get_target_from_path should do. But, I don’t quite understand the path we give here, how would it be used exactly. The example from @lzozo95 isn’t clear for me. I don’t understand how this get_target_from_path works. I’m new to PyTorch and deep learning. It would be nice if you explain more. Thank u.

The original author used this method to extract the target from the file name string.
How are your targets stored?

Thanks for reply. I stored my target in two separate folders.

This is my dataset class,

class BasicDataset(Dataset):
def init(self, imgs_dir, masks_dir, scale=1):
self.imgs_dir = imgs_dir
self.masks_dir = masks_dir
self.scale = scale
assert 0 < scale <= 1, ‘Scale must be between 0 and 1’

    self.ids = [splitext(file)[0] for file in listdir(imgs_dir)
                if not file.startswith('.')]'Creating dataset with {len(self.ids)} examples')

def __len__(self):
    return len(self.ids)

def preprocess(cls, pil_img, scale):
    w, h = pil_img.size
    newW, newH = int(scale * w), int(scale * h)
    assert newW > 0 and newH > 0, 'Scale is too small'
    pil_img = pil_img.resize((newW, newH))

    img_nd = np.array(pil_img)

    if len(img_nd.shape) == 2:
        img_nd = np.expand_dims(img_nd, axis=2)

    # HWC to CHW
    img_trans = img_nd.transpose((2, 0, 1))
    if img_trans.max() > 1:
        img_trans = img_trans / 255

    return img_trans

#def get_target_from_path(self,imgs_dir,masks_dir):
   # return 

def __getitem__(self, i):
    idx = self.ids[i]
    mask_file = glob(self.masks_dir + idx + '*')
    img_file = glob(self.imgs_dir + idx + '*')

    assert len(mask_file) == 1, \
        f'Either no mask or multiple masks found for the ID {idx}: {mask_file}'
    assert len(img_file) == 1, \
        f'Either no image or multiple images found for the ID {idx}: {img_file}'
    mask =[0])
    img =[0])
    ## dynamicly loading the images
    #mask_y = self.get_target_from_path(mask_file[0])
    #img_y = self.get_target_from_path(img_file[0])

    assert img.size == mask.size, \
        f'Image and mask {idx} should be the same size, but are {img.size} and {mask.size}'

    img = self.preprocess(img, self.scale)
    mask = self.preprocess(mask, self.scale)

    # return mask_y and img_y as well
    return {'image': torch.from_numpy(img), 'mask': torch.from_numpy(mask)}

And, I use it like this in main.

dir_img = ‘data/imgs/’
dir_mask = ‘data/masks/’

dataset = BasicDataset(dir_img, dir_mask, img_scale)
n_val = int(len(dataset) * val_percent)
n_train = len(dataset) - n_val
train, val = random_split(dataset, [n_train, n_val])

(Need to mention, code is from open source.)

It seems you are dealing with input images and masks, which could be a segmentation use case?
If that’s the case, the mask should be your target. I’m not sure, how your use case is related to the get_target_from_path method.

Yes, I am dealing with a segmentation task. There is a choice to load whole dataset from ssd to ram or dynamically load the images. The idea is to save memory. I know now what should I do, I should load the masks using get_target_from_path, instead of, mask =[0]).