Load Coil20 Dataset

Soniya · April 29, 2017, 6:41am

Hi, I want to use Coil20 dataset for my project. I have the dataset in MAT file. Is there any way to use this data in PyTorch?

apaszke · April 29, 2017, 11:53am

You can use any Python package that can load MAT files (I think scipy.io can), and then use torch.from_numpy to convert the loaded numpy arrays to PyTorch tensors.

Soniya · April 29, 2017, 3:16pm

Thanks Adam! I can load MAT files into python using scipy.io as you said. But there is one doubt that in MATLAB we read multidimensional array like HxWxDxN where H is denoting number of rows (1 dimension) , W is number of columns(2nd dimensions), D is depth (3rd dimensions) and N is number of instances(4th dimensions). However, in python the syntax is different, we denote like NxDxWxH. So do i need to reshape the multidimensional array as i have to use the data to train a convolutional neural network?

apaszke · April 29, 2017, 6:19pm

You don’t need to reshape it, you need a transpose (the permute function might be the most convenient here).

Soniya · May 1, 2017, 6:28am

I have used transpose method and then converted the numpy ndarray into tensor. With this done, i combined the data(or features or images) and targets values using data_utils.TensorDataset
and load using data using data_utils.DataLoader. I have used these two things in the following way.

train = data_utils.TensorDataset(features, targets)
train_loader = data_utils.DataLoader(train, batch_size=50, shuffle=True)

is this correct?

apaszke · May 1, 2017, 4:01pm

Yes, that should work.

Soniya · May 2, 2017, 5:13am

Yes, it is working…

Thanks Adam!
Thanks a lot…

akhilpan · July 16, 2018, 10:15am

Soniya even I want to load .mat files containing MNIST data set so could you please send me your complete code of how to make the dataset iterable using pytorch Dataloader.

akhilpan · July 29, 2018, 5:54am

Hii everyone.
I wanted to split my MNIST training set, consisting of 60000 images, into training and test set consisting of 50000 and 10000 images respectively.
I got the idea from stackoverflow. A part of my code is given below and it’s working correctly. Hope it will be of some help in near future.

transform= transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,),(0.3081,))]) 

dataset= dsets.MNIST(root='./data/', transform=transform, train= True)
test_set= dsets.MNIST(root='./data/', transform=transform, train= False)

#Setting up hyper-parameters

batch_size= 32
learning_rate= 0.001
epochs= 2
shuffle_dataset= True
random_seed= 7 # so that we get same train and val set everytime
validation_split= .2 # percentage of trainset to make as validation set
# we are taking to be 20 % = .2

dataset_size= len(dataset)
indices= list(range(dataset_size))
#split= int(np.floor(validation_split * dataset_size))

#Since we want 10000 images as validation set
#we can set split=  10000

split= 10000

if shuffle_dataset:
    np.random.seed(random_seed)
    np.random.shuffle(indices)

train_indices, val_indices = indices[split:], indices[:split]

#Create PyTorch data samplers and loaders
train_sampler= SubsetRandomSampler(train_indices)
val_sampler= SubsetRandomSampler(val_indices)

#Now load the dataset into train and val
train_loader= torch.utils.data.DataLoader(dataset,batch_size= batch_size, sampler= train_sampler)
val_loader= torch.utils.data.DataLoader(dataset, batch_size=batch_size, sampler=val_sampler)
#test loader is as usual from the MNIST test set
test_loader= torch.utils.data.DataLoader(test_set, batch_size= batch_size)

Now

kelam_goutam · August 8, 2018, 6:01am

from scipy.io import loadmat
from PIL import Image
import torchvision.transforms as t

img = loadmat(path) # path of images to be loaded
img = img['I'] # img is stored under key 'I' (Assuming)
img =  Image.fromarray(img,"L") # Assuming matfile contained np.array.
                                # we need to convert to PIL image
                                # "L" for grayscale "RGB" for color
img = t.To_Tensor(img) # convert to channelxheightxwidth

Hope this helps
Happy Coding