How can I make npz dataloader?


I have these ImageNet 32x32 npz dataset.
How can I concatenate these batches and make one dataloader?
And npz file contains the label information too?

Please help.
Thank you!

The .npz file format is usually used by numpy.savez so we cannot know, what’s inside the data.

You can use np.load to load each file and inspect it.
Once you got the numpy arrays, you could transform them to tensors via torch.from_numpy and create your Dataset.

1 Like

Here’s an NPZ loader I wrote for my own dataset…

Like Peter said, it’s going to be different for each archive structure… but perhaps this will be useful to get started…

import torch
import numpy as np
from pathlib import Path

class NPZLoader(dataloader.Dataset):
    def __init__(self, path, transform=None):
        self.path = path
        self.files = list(Path(path).glob('*/*.npz'))
        self.transform = transform

    def __len__(self):
        return len(self.files)

    def __getitem__(self, item):
        numpy_array = np.load(str(self.files[item]))['arr_0']
        torch_array = torch.from_numpy(numpy_array)
        if self.transform is not None:
            torch_array = self.transform(torch_array)
        return torch_array, 0