Load Coil20 Dataset

Hi, I want to use Coil20 dataset for my project. I have the dataset in MAT file. Is there any way to use this data in PyTorch?

You can use any Python package that can load MAT files (I think scipy.io can), and then use torch.from_numpy to convert the loaded numpy arrays to PyTorch tensors.

2 Likes

Thanks Adam! I can load MAT files into python using scipy.io as you said. But there is one doubt that in MATLAB we read multidimensional array like HxWxDxN where H is denoting number of rows (1 dimension) , W is number of columns(2nd dimensions), D is depth (3rd dimensions) and N is number of instances(4th dimensions). However, in python the syntax is different, we denote like NxDxWxH. So do i need to reshape the multidimensional array as i have to use the data to train a convolutional neural network?

You don’t need to reshape it, you need a transpose (the permute function might be the most convenient here).

I have used transpose method and then converted the numpy ndarray into tensor. With this done, i combined the data(or features or images) and targets values using data_utils.TensorDataset
and load using data using data_utils.DataLoader. I have used these two things in the following way.

train = data_utils.TensorDataset(features, targets)
train_loader = data_utils.DataLoader(train, batch_size=50, shuffle=True)

is this correct?

Yes, that should work.

Yes, it is working… :smile:

Thanks Adam!
Thanks a lot… :blush:

Soniya even I want to load .mat files containing MNIST data set so could you please send me your complete code of how to make the dataset iterable using pytorch Dataloader.

Hii everyone.
I wanted to split my MNIST training set, consisting of 60000 images, into training and test set consisting of 50000 and 10000 images respectively.
I got the idea from stackoverflow. A part of my code is given below and it’s working correctly. Hope it will be of some help in near future.:sunglasses:

transform= transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,),(0.3081,))]) 

dataset= dsets.MNIST(root='./data/', transform=transform, train= True)
test_set= dsets.MNIST(root='./data/', transform=transform, train= False)

#Setting up hyper-parameters

batch_size= 32
learning_rate= 0.001
epochs= 2
shuffle_dataset= True
random_seed= 7 # so that we get same train and val set everytime
validation_split= .2 # percentage of trainset to make as validation set
# we are taking to be 20 % = .2

dataset_size= len(dataset)
indices= list(range(dataset_size))
#split= int(np.floor(validation_split * dataset_size))

#Since we want 10000 images as validation set
#we can set split=  10000

split= 10000

if shuffle_dataset:
    np.random.seed(random_seed)
    np.random.shuffle(indices)

train_indices, val_indices = indices[split:], indices[:split]

#Create PyTorch data samplers and loaders
train_sampler= SubsetRandomSampler(train_indices)
val_sampler= SubsetRandomSampler(val_indices)

#Now load the dataset into train and val
train_loader= torch.utils.data.DataLoader(dataset,batch_size= batch_size, sampler= train_sampler)
val_loader= torch.utils.data.DataLoader(dataset, batch_size=batch_size, sampler=val_sampler)
#test loader is as usual from the MNIST test set
test_loader= torch.utils.data.DataLoader(test_set, batch_size= batch_size)

Now

from scipy.io import loadmat
from PIL import Image
import torchvision.transforms as t

img = loadmat(path) # path of images to be loaded
img = img['I'] # img is stored under key 'I' (Assuming)
img =  Image.fromarray(img,"L") # Assuming matfile contained np.array.
                                # we need to convert to PIL image
                                # "L" for grayscale "RGB" for color
img = t.To_Tensor(img) # convert to channelxheightxwidth

Hope this helps
Happy Coding