Custom dataset for hyperspectral data (.mat file)

Hi! I am trying to make custom dataset for HSI (HxWxD) and groundtruth (HxW) data stored in .mat file.
Which i would like to use in:
train_loader = torch.utils.data.DataLoader()
I found class like:
from torch.utils.data.dataset import Dataset
class MyCustomDataset(Dataset):
def init(self, …):
# stuff
def getitem(self, index):
# stuff
return (img, label)
def len(self):
return count
To make further 3D CNN mdel for classification.
Could you give me some help or examples for this problem of mine?
Or is this even necessary for making custom data class for cnn?
Thank you in advance!

You could pass the paths to your Dataset's __init__ function and lazily load each sample in __getitem__.
Since your data is stored in .mat files, you could try to use scipy.io.loadmat to load it.

Ty for reply. I came up with something like this:

 
from torch.utils.data.dataset import Dataset
import scipy.io as io
 class MyDataset(Dataset):
   def __init__(self, mat_path, gt_path):      
       data = io.loadmat(mat_path)['pavia']
       data = numpy.uint8(data)
       self.images = torch.from_numpy(data)
       data_gt = io.loadmat(gt_path)
       self.target = torch.from_numpy(data_gt['pavia_gt'])            
 
  def __getitem__(self, index):
       x = self.images[index]
       y = self.target[index]
       return x, y
     
   def __len__(self):
       return len(self.images)

mat_path = ('./data/Pavia.mat')
gt_path = ('./data/Pavia_gt.mat')
custom_dataset = MyDataset(mat_path, gt_path)
train_loader = torch.utils.data.DataLoader(dataset=custom_dataset,
                                           batch_size=64, 
                                           shuffle=True)

Now i have no idea how to use this in torch.utils.data
Or should i make a Train/Test Set split first?

You could perform the splitting beforehand.
Based on your code, it seems that using a torch.utils.data.Subset would be easier, as you would only have to provide the corresponding indices.

You can just iterate the DataLoader:

for data, target in train_loader:
    data = data.to(device)
    target =  target.to(device)
    optimizer.zero_grad()
    output = model(data)
    ...

It will automatically create the batches, shuffle your Dataset, and use multiprocessing to load the data (if num_workers > 0).