Building custom dataset, how to return ids as well?


#1

so the format of a custom dataset should be like fllowing:

import torch
from torch.utils import data

class Dataset(data.Dataset):
  'Characterizes a dataset for PyTorch'
  def __init__(self, list_IDs, labels):
        'Initialization'
        self.labels = labels
        self.list_IDs = list_IDs

  def __len__(self):
        'Denotes the total number of samples'
        return len(self.list_IDs)

  def __getitem__(self, index):
        'Generates one sample of data'
        # Select sample
        ID = self.list_IDs[index]

        # Load data and get label
        X = torch.load('data/' + ID + '.pt')
        y = self.labels[ID]

        return X, y

I like to have have ID information in the output in addition to x and y. So i did return X, y, ID
, but now when I do

data_loader = data.DataLoader(dataset, args.batch_size,
                                  num_workers=args.num_workers,
                                  shuffle=True )

batch_iterator = iter(data_loader)
images, targets, id  = next(batch_iterator)

I receive an error,
anyone knows why?


(Justus Schock) #2

All data returned by a dataset needs to be a tensor, if you want to use the default collate_fn of the Dataloader. You have two options: write a custom collate function and pass it to the dataloader or wrap your ID inside a tensor (which is simpler I guess) and unwrap it outside the dataloader.


#3

how can we wrap a string in a tensor? :thinking:


(Justus Schock) #4

Ah sorry, I implied your ID would be an integer. You cannot wrap a string to a tensor. I could think of some ways to achieve something like that, but it would not be very pytorch-like. If you are interested in these Ways you can PM me.


(Juan F Montesinos) #5

very weird way but u can just conver string into integers through an asci table and convert them back to an string calling a function.

got from internet

>>> s = 'hi'
>>> [ord(c) for c in s]
[104, 105]

(Justus Schock) #6

that’s what I thought about too. I also thought about wrapping the loader itself, but one would have to define a new iterator for this. I proposed another method, and if this method works (currently waiting for verification), I will post it here later on.


(Justus Schock) #7

@isalirezag reported this to work great.