Reading Images in .mat format in Python

Hello I have recently moved from MATLAB to python for deep learning task. I have Matlab saved images in .mat format.
I found this code which has folder structure for labelling the data similar to mine. So I decide to adopt that code and modify it to read .mat files.
Each .mat file has the size 256x256x11 (11 is the number of channels.). I also found that I can load the .mat files using Scipy.io.loadmat(). But I am not sure where I have to add that and what other things I need to change.

https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/5_DataManagement/build_an_image_dataset.py?fbclid=IwAR3SQfEkXimVUlUsExcVxuoIsCSd2hhVn0WmhgJibnyojujOKWgMQX_ZZE8

I think writing a custom Dataset class with scipy.io.loadmat() in your __getitem__() method would be the easiest option. Although with 11 channels you won’t be able to convert/transform them with PIL.

It depends how you have your dataset structured but something like this:

class MyDataset(Dataset):
    """custom dataset for .mat images"""

    def __init__(self, list_of_urls):
        self.list_of_urls = list_of_urls

    def __len__(self):
        return len(self.list_of_urls)

    def __getitem__(self, index):
        image_url = self.list_of_urls[index]
        image = scipy.io.loadmat(image_url)
        label = ...
        ...
       

The Functions you have defined is working like first readig the list of files in a folder then extracting from files from that folder one by one?

The other solution I was thinking In on main root directory I have 7 folder (7 classes). Each folder have 100 images. So reading each images folder by folder and putting it in one 4D array… 256x256x11x700 or seven different arrays

import glob
DATASET_PATH = ‘D:/Dataset/Multi-resolution_data/Visual/High/’ # the dataset file or root folder path.
files = [f for f in glob.glob(DATASET_PATH + “**/*.mat”, recursive= True)]
for f in files:
print(files[1])

Yes, although using the Dataset class you can pass this to a DataLoader and use multiple workers to load and perform any pre-processing in parallel.

That’s also an option if you have enough memory to load the 256*256*11*n array at once, and will be much quicker than lazily loading images from disk.

Since you have your classes in separate folders, there is also the option of sub-classing ImageFolder to load .mat files, though I’ve never tried this.

I have tried the ImageFolder class but it only supports image formats not working with .mat files.
Can you pls give me how can i make a loop to read the files from directory and store in 4D array?

If you really want to go the 4D array route this is how I would approach it (might not be optimal).
Note that scipy.io.loadmat() returns a dictionary so you’ll need to know the name of the variable stored within it.

import glob
import os

DATASET_PATH = "D:/Dataset/Multi-resolution_data/Visual/High/"
VARIABLE_NAME = "example"
files = glob.glob(DATASET_PATH + “**/*.mat”, recursive= True)

class_labels = [i.split(os.sep)[-2] for i in files]
class_dict = {label: index for index, label in enumerate(set(class_labels))}
class_idxs = [class_dict[label] for label in class_labels]

array_store = []
 
for f in files:
    mat_dict = scipy.io.loadmat(f)
    img = mat_dict[VARIABLE_NAME]
    array_store.append(img)

stacked_array = np.stack(array_store, axis=3)

# convert to torch.tensor ...

When I just simply try to load any single file like
As you mentioned earlier that it gives a dictionary so i thought to check it. Its actually an image of 256x256x11. Is there any other way i can chage the format of data and load in python easily?
from scipy.io import loatmat
img = loadmat(‘D:/Dataset/Multi-resolution_data/Visual/High/1/image100_256.mat’)

It gives an error:

Traceback (most recent call last):

  File "<ipython-input-28-d42e90ca2d13>", line 1, in <module>
    mat_dict = loadmat('D:/Dataset/Multi-resolution_data/Visual/High/1/image100_256.mat');

  File "d:\python\python37\lib\site-packages\scipy\io\matlab\mio.py", line 217, in loadmat
    MR, _ = mat_reader_factory(f, **kwargs)

  File "d:\python\python37\lib\site-packages\scipy\io\matlab\mio.py", line 78, in mat_reader_factory
    raise NotImplementedError('Please use HDF reader for matlab v7.3 files')

NotImplementedError: Please use HDF reader for matlab v7.3 files

This line is showing syntax error and i am not exactly getting what it is doing.

Ah good catch, that line is all kinds of wrong. It was supposed to convert your class labels into integers, but it looks like you already have them labelled as integers so it is redundant.

As for opening matlab v7.3 it looks like scipy will not work and you will have to use h5py instead.

I’ve implemented a library that can read the mat 7.3 files, have a look here:

https://github.com/skjerns/mat7.3

or install via pip pip install mat73 and load with

import mat73
data_dict = mat73.loadmat('data.mat')