Applying transforms on dataset collected from HDF5 file

bhushans23 · April 17, 2018, 8:09pm

I am reading images from H5 File as below

syn1 = hf['data_1']
syn1 = np.array(syn1[:,:,:])
....
....
synImage1 = torch.utils.data.DataLoader(syn1, batch_size=50)

But, DataLoader does not apply transform where I can normalize image.

Can we use Datasets on Dataset collected from H5 file separately to normalize?
How to apply transform independent of datasets class?

ptrblck · April 17, 2018, 8:38pm

The torchvision transformations usually work with PIL.Images or Tensors.
You could transform your numpy arrays to Tensors using a Dataset and then apply the transformations.

If you don’t want to use a Dataset, you could normalize the images using pure numpy.

bhushans23 · April 18, 2018, 3:53pm

I am looking for that link to convert numpy array into dataset so that while converting so, I can apply transform

ptrblck · April 18, 2018, 4:17pm

You could try something like this:

class MyDataset(Dataset):
    def __init__(self, numpy_arr, mean, std):
        self.data = torch.from_numpy(numpy_arr)
        self.mean = mean
        self.std = std
        #normalize here or in __getitem__

    def __getitem__(self, index):
        data = self.data[index]
        #if not already normalized
        data = data - mean
        data = data / std
        return data

    def __len__(self):
        return len(self.data)

Using this approach you need to load all the data beforehand. I think HDF5 can also lazily load the data.
Let me know, if this works for you!