Creating custom image classes for .npy - massive data loading

I am trying to train on around 200G of .npy files. I have a custom image class:

class CustomImageFolder(ImageFolder):
def __init__(self, root, transform=None):
    super(CustomImageFolder, self).__init__(str(root),transform)

def __getitem__(self, index):
    path = self.imgs[index][0]
    img = np.load(path)
    img /= 255 # normalization
    return img
    root = Path(dset_dir).joinpath('ZebraFish/train/')
    transform = None
    train_kwargs = {'root':root, 'transform':transform}
    dset = CustomImageFolder


train_dataset = dset(**train_kwargs)
train_loader = DataLoader(dataset=train_dataset,
                          batch_size=batch_size,
                          shuffle=True,
                          num_workers=num_workers,
                          pin_memory=True,
                          drop_last=True)

I’m getting the following error: RuntimeError: Found 0 files in subfolders of: data/ZebraFish/train
Supported extensions are: .jpg,.jpeg,.png,.ppm,.bmp,.pgm,.tif.

I see that the default loader function will create a PIL object. Although since I’m working with .npy is there a simple way around this?

Is there a way to make the dataLoader have this same massive functionality with .npy files?
All the best

Sure! You don’t need to inherit from ImageFolder.
Just create your own Dataset and load your numpy arrays as you want:

class MyDataset(Dataset):
    def __init__(self, root, transform=None):
        self.image_paths = os.glob.(... # get your numpy array paths here

    def __getitem__(self, index):
        img = np.load(self.image_paths[index])
        ...
    
    def __len__(self):
        return len(self.image_paths)
2 Likes