What should I output when defining an image-to-value dataset?

This is the first time I define my own dataset
I want to define a dataset, each entry of which is one image associated with a value.
When I define the dataset, should I define the self.samples as an array of:
one string containing the image disk path, and one floating value
or
the real image RGB value matrix and one floating value
?
When you call the dataset use the following syntax:

train_data=torchvision.datasets.MYDATA('../mydata', train=True, download=True,
                   transform=torchvision.transforms.Compose([
                       torchvision.transforms.Resize((h_image, w_image)),
                       torchvision.transforms.ToTensor()
                   ]))

or

train_data=MYDATA('../mydata', train=True, download=True,
                   transform=torchvision.transforms.Compose([
                       torchvision.transforms.Resize((h_image, w_image)),
                       torchvision.transforms.ToTensor()
                   ]))

What conditions do you need to meet to use the transforms utilities? like: rotate and flip?
Hope I described my question clearly.
Thanks.

I would recommend to stick to the first approach of storing the paths only and lazily load the samples in the __getitem__ method as it will save memory.
You would then have to initialize your dataset using its __init__ method so the second call looks more correct although you would have to make sure all arguments are really expected (e.g. the download argument might not be needed).

1 Like

Thanks.

So, you are suggesting the following?:
Define Data:

class MYDATA(Dataset):
    def __init__(self, root='folder_path', parm1, parm2, parm3, parm4, train=True, data_augmentation=False)
        super(MYDATA, self).__init__()
        if parm1:
            ...
        if parm2:
            ...
        if parm3:
            ...
        if parm4:
            ...
        if train:
            ...
        else:
            ...
        #after a bunch of no_matter_what operations
        self.samples = [(path, value), ...]

    def __len__():
        return len(self.samples)

    def __getitem__(self, idx):
        image = PIL.Image.open(self.samples[idx][0])
        imagearray = numpy.asarray(image)
        return (imagearray, sefl.samples[idx][1])

Get Data:

train_data=MYDATA('../mydata', train=True, parm1=parm1,
                   transform=torchvision.transforms.Compose([
                       torchvision.transforms.Resize((h_image, w_image)),
                       torchvision.transforms.ToTensor()
                   ]))

What I understood now is:
1, I can prepare my data in init() according to some conditions, for example: parm1, parm2, …

What don’t understand is:
1, When is the getitem() called, because it is never been called explicitly in the Get Data code
2, I never define any behavior of transforms in MYDATA class, can I still use the torchvision.transforms tools?
3, If I want to do data_augmentation, for example, add some rotated and flipped copy of the same image, and associated them with the same value, should I do it explicitly in the init()?
4, Can torchvision.transforms handle the data_augmentation operation for me? or it can only change (rotate, flip) and replace the original data entry instead of adding some new entries?
5, If I defined all things in MYDATA class init(), including data augmentation, and defined the converting image path to array in getitem(), do I still need to use those torchvision.transforms calls? What the torchvision.transforms.ToTensor() did exactly?

I think that is all questions I can think of right now. (It is alot, ^_^)

Thank you very much!

  1. The __getitem__ function is called then the Dataset is indexed either directly:
x, y = dataset[index]

or from the DataLoader by e.g. iterating it:

for data, target in loader:
    ...

Internally the DataLoader will use the sampler to create indices and index the internal Dataset with it.

  1. Yes, you should pass the transformations as an object to the Dataset.__init__ method and use it in the __gettitem__. Usually something like this is used:
def __init__(self, transform=None):
    self.data = ...
    self.transform = transform
    ...

def __getitem__(self, index):
    x = self.data[index]
    if self.transform:
        x = self.transform(x)
    ...
    return x, y
  1. This would be a valid approach, but usually data augmentation is done on-the-fly and a single sample is returned in the __getitem__ method.

  2. Yes, torchvision.transforms provide also transformations for rotation etc.

  3. Yes, you need to call the transformations in the __getitem__.

1 Like

Thanks! :grinning: :grinning: :grinning: :grinning: :grinning: :grinning: :grinning: :grinning:

If I want to use data.to(device) later to move the data into GPU, do I need to write anything in MYDATA class? Thanks.

No, the common approach is to move the data to the GPU in the DataLoader loop.