Building custom dataset, how to return ids as well?

so the format of a custom dataset should be like fllowing:

import torch
from torch.utils import data

class Dataset(data.Dataset):
  'Characterizes a dataset for PyTorch'
  def __init__(self, list_IDs, labels):
        self.labels = labels
        self.list_IDs = list_IDs

  def __len__(self):
        'Denotes the total number of samples'
        return len(self.list_IDs)

  def __getitem__(self, index):
        'Generates one sample of data'
        # Select sample
        ID = self.list_IDs[index]

        # Load data and get label
        X = torch.load('data/' + ID + '.pt')
        y = self.labels[ID]

        return X, y

I like to have have ID information in the output in addition to x and y. So i did return X, y, ID
, but now when I do

data_loader = data.DataLoader(dataset, args.batch_size,
                                  shuffle=True )

batch_iterator = iter(data_loader)
images, targets, id  = next(batch_iterator)

I receive an error,
anyone knows why?

1 Like

All data returned by a dataset needs to be a tensor, if you want to use the default collate_fn of the Dataloader. You have two options: write a custom collate function and pass it to the dataloader or wrap your ID inside a tensor (which is simpler I guess) and unwrap it outside the dataloader.

1 Like

how can we wrap a string in a tensor? :thinking:

Ah sorry, I implied your ID would be an integer. You cannot wrap a string to a tensor. I could think of some ways to achieve something like that, but it would not be very pytorch-like. If you are interested in these Ways you can PM me.

very weird way but u can just conver string into integers through an asci table and convert them back to an string calling a function.

got from internet

>>> s = 'hi'
>>> [ord(c) for c in s]
[104, 105]

that’s what I thought about too. I also thought about wrapping the loader itself, but one would have to define a new iterator for this. I proposed another method, and if this method works (currently waiting for verification), I will post it here later on.

@isalirezag reported this to work great.


Sorry for late coming, but what I want to ask is does collate_fn can return a dict that some value is string now?