How to create Dataset out of Matlab Array?

FreSch48 · December 7, 2021, 10:25am

Hi guys,

I am absolutely new here and therefore have little or no experience with PyTorch.
I would like someone to be able to tell me the best way to use a data array from Matlab as a data set, preferably with a normalization of the data, e.g. in the range [0, 1].

Here a little more detail about my data:
I have a two-dimensional array, with the rows corresponding to frames from several spectrograms. The size of the array is 60000 x 1025, i.e. 60000 frames with 1025 frequency bins. Since I use an auto encoder, my input data is also my target data.

I already know that I can use mat73.loadmat (path) to read my Matlab data as an array in Python. Now the only question is how can I create a normalized dataset from this?

I hope you guys can help me.
Thanks in advance

mMagmer · December 7, 2021, 3:02pm

if you are loading your data to a numpy array, do as follow:

dataset = torch.from_numpy(data)

and if your data is loading to python list:

dataset = torch.tensor(data)

for data normalization:

dMin  = (dataset.min(0,keepdim=True))[0]
dMax = (dataset.max(0,keepdim=True))[0]
NDataset = (dataset - dMin)/(dMax - dMin)

then you need to define data loader:

dl = torch.utils.data.DataLoader(NDataset,batch_size=100,shuffle=True)

for i in range(epoch):
    for x in dl:
         #do something with it.

FreSch48 · December 9, 2021, 8:59am

Thanks for your help.