Applying transforms on dataset collected from HDF5 file

I am reading images from H5 File as below

syn1 = hf['data_1']
syn1 = np.array(syn1[:,:,:])
....
....
synImage1 = torch.utils.data.DataLoader(syn1, batch_size=50)

But, DataLoader does not apply transform where I can normalize image.

Can we use Datasets on Dataset collected from H5 file separately to normalize?
How to apply transform independent of datasets class?

The torchvision transformations usually work with PIL.Images or Tensors.
You could transform your numpy arrays to Tensors using a Dataset and then apply the transformations.

If you don’t want to use a Dataset, you could normalize the images using pure numpy.

I am looking for that link to convert numpy array into dataset so that while converting so, I can apply transform

You could try something like this:

class MyDataset(Dataset):
    def __init__(self, numpy_arr, mean, std):
        self.data = torch.from_numpy(numpy_arr)
        self.mean = mean
        self.std = std
        #normalize here or in __getitem__

    def __getitem__(self, index):
        data = self.data[index]
        #if not already normalized
        data = data - mean
        data = data / std
        return data

    def __len__(self):
        return len(self.data)
        

Using this approach you need to load all the data beforehand. I think HDF5 can also lazily load the data.
Let me know, if this works for you!

1 Like