How to read a dataset in .mat form in pytorch

ky_Pa · March 2, 2020, 4:15am

I have two datasets in the form of .mat. I want to use the scipy.io library and the h5py library to read and apply them to the program, but I don’t know how to operate. Please give pointers, thank you.The code that introduces the data set section is as follows.

 # params for source dataset
    src_dataset = "maria"
    src_encoder_restore = os.path.join(model_root, src_dataset + "-source-encoder-final.pt")
    src_classifier_restore = os.path.join(model_root, src_dataset + "-source-classifier-final.pt")
    src_model_trained = True

    # params for target dataset
    tgt_dataset = "sandy"
    tgt_encoder_restore = os.path.join(model_root, tgt_dataset + "-target-encoder-final.pt")
    tgt_model_trained = True

ptrblck · March 2, 2020, 4:26am

This post might be helpful, if you would like to read these .mat files lazily.
However, it seems your dataset might be stored completely in the file.
In that case you could read it outside of the Dataset using scipy.io, transform them to tensors via torch.from_numpy and use a TensorDataset.

ky_Pa · March 2, 2020, 4:46am

I can read .mat into matrix form through scipy.io, and then how can I load it out in my program, can you show me the specific code, thank you

ptrblck · March 2, 2020, 5:13am

Here is a small dummy example:

mat = scipy.io.loadmat('test.mat')
data = mat['data'] # use the key for data here
target = mat['target'] # use the key for target here

data = torch.from_numpy(data).float()
target = torch.from_numpy(target).long() # change type to your use case

dataset = TensorDataset(data, target)

ky_Pa · March 2, 2020, 5:33am

Thank you very much, I will try it. The other .mat file is -v7.3 and needs to be read using the h5py library. Please tell me how to do it in detail. Please write it in code.

ky_Pa · March 2, 2020, 8:13am

May I change it like this? But I will get this error, how can I modify it?

params for source dataset

src_dataset = scipy.io.loadmat('E:\\ADDA\\pytorch-adda-master-lab\\datasets\\lab\\maria\\mat\\test_target_domain_maria.mat')
testdata = src_dataset['testdata'] # use the key for data here
testlabel = src_dataset['testlabel'] # use the key for target here

testdata = torch.from_numpy(testdata).float()
testlabel = torch.from_numpy(testlabel).long() # change type to your use case

dataset = TensorDataset(testdata, testlabel)

src_encoder_restore = os.path.join(model_root, src_dataset + "-source-encoder-final.pt")
src_classifier_restore = os.path.join(model_root, src_dataset + "-source-classifier-final.pt")
src_model_trained = True```

Mert_Dunver · March 29, 2022, 5:11pm

Hello sir i understood the part about the data part but what are we supposed to use for the target = mat[“target”] part. Are we supposed to enter the labels here?

ptrblck · March 29, 2022, 5:25pm

Yes,

target = mat['target'] # use the key for target here

assumes you’ve stored the target values in the .mat file using the 'target' key.
If that’s not the case and e.g. you don’t have any targets, you can skip this step.

Mert_Dunver · March 29, 2022, 5:40pm

Can I use the tensor to train my neural network directly or do i need to apply dataloaders.

ptrblck · March 29, 2022, 5:50pm

You can use tensors directly. DataLoaders are able to shuffle, create batches, use multiple workers to load the next batches etc. but are not necessarily needed.