DataLoader with list of features for FeatureExtract, and ROI

Hello everyone!

I try to rebuild the following paper work with the KITTI-Dataset.

Source | Paper

Unfortunately I have no clue how to pass the different features to the DataLoader and recall them by the different steps (image*, bounding boxes for ROI Polling, types/classes, and additional features of the KITTI_Dataset) and of course the labels: distance (which is already concated in my textfiles).

*Image is the “main feature” from where the Features should be extracted by the FeactureExtractor.
My CustomDataset-Class looks like this right now:

import csv
import os
import pandas as pd
from typing import Any, Callable, List, Optional, Tuple

from PIL import Image
from import VisionDataset

# df_dict = {'type': 0, 'truncated': 1, 'occluded': 2, 'alpha': 3, 'bbox': 4, 'dimensions': 5, 'location': 6, 'rotation_y': 7, 'forward_distance': 8, 'forward_keypoint': 9}
# df_types_dict = {'Car': 0, 'Van': 1, 'Truck': 2,'Pedestrian': 3, 'Person_sitting': 4, 'Cyclist': 5, 'Tram': 6, 'Misc': 7, 'DontCare': 8}

class KittiDist(VisionDataset):
    """KittiDist-Class for `KittiDist`_Dataset.

    It corresponds to the "left color images of object" dataset, for object detection.

        root (string): Root directory where images are downloaded to.
            Expects the following folder structure if download=False:

            .. code::

                    └── KittiDist --> During Test: "TEST_SET"
                        ├── images
                        └── distLabels

        develop (bool, optional): If true, another File-Path is used. Could be used if
            during development another (smaller, special selected) Dataset-Folder is needed.
        train (bool, optional): Use ``train`` split if true, else ``test`` split.
            Defaults to ``train``.
        transform (callable, optional): A function/transform that takes in a PIL image
            and returns a transformed version. E.g, ``transforms.PILToTensor``
        target_transform (callable, optional): A function/transform that takes in the
            target and transforms it.
        transforms (callable, optional): A function/transform that takes input sample
            and its target as entry and returns a transformed version.


    # Dataset
    # Numbers of images: 007480.png (last image)

    image_dir_name = "images"
    labels_dir_name = "distLabel"

    main_dir_name = "KITTI-Dist"
    develop_dir_name = "TEST_SET"
    # Set DevMode (Small Dataset), else whole Dataset
    # develop: bool = True

    def __init__(
            root: str,
            develop: bool = False,
            train: bool = True,
            transform: Optional[Callable] = None,
            target_transform: Optional[Callable] = None,
            transforms: Optional[Callable] = None,


        self.images = [] # A list of all images
        self.targets = [] # A list of all Anno-Files (distLabel), xy.targets = [idx] will return a certain path.
        self.root = root
        self.train = train
        self.develop = develop
        self._location = self.develop_dir_name if self.develop else self.main_dir_name

        image_dir = os.path.join(self.root, self._location, self.image_dir_name)

        if self.train:
            labels_dir = os.path.join(self.root, self._location, self.labels_dir_name)

        # For every single Image in the path "image_dir" will be appended the Ref. to the "image"-List.
        for img_file in os.listdir(image_dir):
            self.images.append(os.path.join(image_dir, img_file))
            if self.train:
                self.targets.append(os.path.join(labels_dir, f"{img_file.split('.')[0]}.txt"))
    def __getitem__(self, index: int) -> Tuple[Any, Any]:
        """Get item at a given index.

            index (int): Index
            tuple: (image, target), where
            target is a list of dictionaries with the following keys:

            - type: str
            - truncated: float
            - occluded: int
            - alpha: float
            - bbox: float[4]
            - dimensions: float[3]
            - locations: float[3]
            - rotation_y: float
            - forward_distance: float
            - forward_keypoint: float[2]

        image =[index])
        target = self._parse_target(index) if self.train else None
        if self.transforms:
            image, target = self.transforms(image, target)
        return image, target

    def _parse_target(self, index: int) -> List:
        target = []
        with open(self.targets[index]) as inp:
            content = csv.reader(inp, delimiter=" ")
            for line in content:
                        "type": line[0],
                        "truncated": float(line[1]),
                        "occluded": int(line[2]),
                        "alpha": float(line[3]),
                        "bbox": [float(x) for x in line[4:8]],
                        "dimensions": [float(x) for x in line[8:11]],
                        "location": [float(x) for x in line[11:14]],
                        "rotation_y": float(line[14]),
                        ## Add distance and coordinates for the LIDAR-Point to which the computed distance belongs
                        "forward_distance": float(line[15]),
                        ## This should be 16:18; [18] because the last number ends on [17]. Compair with location, which ends with [13].
                        "forward_keypoint": [float(x) for x in line[16:18]],

        return target

    def __len__(self) -> int:
        return len(self.images)

The “TEST_SET” is just another folder, with less images during development.

Additionally my resize/augmentation function, which is not implemented into the DatasetClass:

import albumentations 
import cv2 
import numpy as np 

def resize_img_bb_kyp(img_arr, bboxes, keypoints, height, width, flip_prob):
    :param img_arr: original image as a numpy array
    :param bboxes: bboxes as numpy array where each row is 'x_min', 'y_min', 'x_max', 'y_max'.
    :param h: resized height dimension of image
    :param w: resized weight dimension of image
    :param keypoints: keypoints as numpy array, each row is 'x' 'y'
    :return: dictionary containing {image:transformed (flipped?), bboxes_transformed:['x_min', 'y_min', 'x_max', 'y_max', "keypoints_transformed]}
    Adapted from:
    # resize transform pipeline
    transform_img_bb_kyp = albumentations.Compose(
        [albumentations.Resize(height=height, width=width, always_apply=True),
        bbox_params=albumentations.BboxParams(format='pascal_voc', label_fields=[]),

    transformed_img_bb_kyp = transform_img_bb_kyp(image=img_arr, bboxes=bboxes, keypoints=keypoints)

    return transformed_img_bb_kyp

So as an overwiew:
main feature: image – for CNN feature extractor
feature: BoundingBoxes – for ROI pooling
feature: Keypoints (maye used later)
feature: type (object classes) – not sure; if I really use it, since I want to use a classifier
label: distance

I have one big folder with all the images; another folder for the Textfiles (which containts the features, and the label).

So I struggle with the following questions:

  1. How to pass the several features and the label (distance) to the DataLoader?
  2. How does the DataLoader look like?
  3. How do I access the different features after handing it to the DataLoader?
  4. Is there a smooth way, how to integrate my resize/augmentation-function into the Dataset-Class, or do I have to use pytrochs Transform? ( I am using my own resize function, since pyTorchs Transform doesnt transform the coordinates of the BoundingBoxes, and keypoints.

I would really appreciate any help.
– Thanks and enjoy your day.

P.S. This issue didn’t really helped me …

Nobody? :confused:
Could you maybe give me a hint @ptrblck

You wouldn’t change anything in the DataLoader, but would implement a custom Dataset returning the desired features, targets, etc.
I’m unsure how to understand points 1, 2, and 3, since you would directly pass the Dataset to the DataLoader.

For 4: Yes, torchvision.transforms or transformations from e.g. augmentations should work.

1 Like

Thank you very much for your reply! @ptrblck
1) How to pass the several features and the label (distance) to the DataLoader?
My CustomDataset return different features and one label.

1. feature: image – for CNN feature extractor
2. feature: BoundingBoxes – for ROI pooling
3. feature: Keypoints (maye used later)
4. feature: type (object classes) – not sure; if I really use it, since I want to use a classifier
y: label: distance

So can I just return from the CustomDataset “image” and “target” (“target” as a dict or pandas DataFrame with BBoxes, Keypoints, type, and the label) and the DataLoader is fine with this?

2) How does the DataLoader look like?
So i don’t have to adapt the DL?

3) How do I access/use the different features after handing it to the DataLoader?
So how do I hand the different features in particular over where they are needed, ergo how to adress them?
→ Image-Feature to the CNN
→ BoundingBoxes (which has to be a Tensor) to the roi-align Function
→ Keypoints …

→ label: “distance” for the loss-calc?

All the examples I saw work with X and y since there is one feature and one label.
I have to tell the CNN: Use feature “image”. The roi-align to use the BoundingBoxes, …
I don’t really understand how to do this, since there is just one y as a “big Tensor”?? with all features in it?

Like it’s totally different from accessing a pandas-DataFrame (mydataframe.boxes, …) or Dictionary (mydataset[1]) …

Hope that makes it a bit easier to understand, what my problem is …

Thanks in anyway for helping out :slight_smile: and greetings.