How can I read hdf5 files stored as 1-D array. and view them as images?

I have a large image classification dataset stored in the format .hdf5. The dataset has the labels and the images stored in the .hdf5 file. I am unable to view the images as they are store in form of an array. The dataset reading code that I have used is as follows,

    import h5py
    import numpy
    f = h5py.File('data/images.hdf5', 'r')


    group = f['datasets']


Now when I read the group cars I have the following output,

    data = group['car']

((51,), (383275,), (257120,)

So it looks like there are 51 images for label car and images are stored as 383275 and 257120 dimensional arrays, with no information about their height and width dimensions. I want to save the images as RGB again.
Next following the code here, I tried to read the images.

    import numpy as np
    from PIL import Image
    # hdf = h5py.File("Sample.h5",'r')
    array = data[0]
    img = Image.fromarray(array.astype('uint8'), 'RGB')"yourimage.thumbnail", "JPEG")

Unfortunately, the following error is received.

    File /usr/local/lib/python3.8/dist-packages/PIL/, in Image.frombytes(self, data, decoder_name, *args)
        781 s = d.decode(data)
        783 if s[0] >= 0:
    --> 784     raise ValueError("not enough image data")
        785 if s[1] != 0:
        786     raise ValueError("cannot decode image data")
    ValueError: not enough image data

References I have already checked the pytorch discuss forums questions like these, hdf group help library etc.
Any help will be highly useful. Thanks.

Without access to the file it’s pretty hard to say, but as a shot in the dark – try seeing if whoever made the file attached the size information in dataset’s attributes (i.e., data.attrs ). Assuming that works out then you’ll want to numpy.reshape the 1D array into n x w x h and then convert those to images.

1 Like