Could you suggest the dataloader for numpy files?

(John1231983) #1

Hello all, I am using below code to load dataset. However, the ImageFolder only worked for png format.

 transform = transforms.Compose([
                    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))

    train_path = os.path.join('./emojis', emoji_type)
    test_path = os.path.join('./emojis', 'Test_{}'.format(emoji_type))

    train_dataset = datasets.ImageFolder(train_path, transform)
    test_dataset = datasets.ImageFolder(test_path, transform)

    train_dloader = DataLoader(dataset=train_dataset, batch_size=opts.batch_size, shuffle=True, num_workers=opts.num_workers)
    test_dloader = DataLoader(dataset=test_dataset, batch_size=opts.batch_size, shuffle=False, num_workers=opts.num_workers)

Instead of using png, my dataset includes numpy array files. Could you suggest any data loader can work on the numpy array files?


You could write your custom Dataset as described in the Data loading tutorial.
Basically, you would provide the image paths in the __init__ method, while loading and transforming each sample in __getitem__.

(John1231983) #3

So, pytorch does not support numpy format for dataloader. Am I right? Because I want to ultilize the transform augmentation in dataloader


torchvision uses PIL as the backbone for loading and transforming images, so you would need to cast your numpy images to PIL.Images to apply the torchvision.transforms.

Alternatively, you could use some OpenCV methods to augment your numpy data, if you don’t want to convert them to PIL.

(John1231983) #5

Hi, I have wrote a customer dataloader for numpy. It worked without using transforms. However,I added transforms and it got the error. Could you please check help me

class NumpyDataset(data.Dataset):

    def __init__(self, root_path, transforms):
        self.data_numpy_list = [x for x in glob.glob(os.path.join(root_path, '*.npy'))]
        self.transforms = transforms
        self.data_list = []
        for ind in range(len(self.data_numpy_list)):
            data_slice_file_name = self.data_numpy_list[ind]
            data_i = np.load(data_slice_file_name)

    def __getitem__(self, index): = np.asarray(self.data_list[index]) = np.stack((,, # gray to rgb 64x64 to 3x64x64
        if self.transforms:
   = self.transforms(
        return torch.from_numpy(

    def __len__(self):
        return len(self.data_numpy_list)
from torchvision import transforms
import as dataloader
data_train = NumpyDataset("numpy_folder", transforms=transforms)
trainloader = dataloader.DataLoader(data_train, batch_size=1, shuffle=True)

The error is = self.transforms(
TypeError: 'module' object is not callable


Currently you are just passing the module transforms to your Dataset. Instead you should specify, which transformation you would like to be applied on your data, e.g. transforms.Resize(224). You can find the transformations in the docs. If you find some interesting image transformations, you would have to cast your numpy array to a PIL.Image before applying them.