Could you suggest the dataloader for numpy files?

John1231983 · February 11, 2019, 2:45pm

Hello all, I am using below code to load dataset. However, the ImageFolder only worked for png format.

 transform = transforms.Compose([
                    transforms.Scale(opts.image_size),
                    transforms.ToTensor(),
                    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
                ])

    train_path = os.path.join('./emojis', emoji_type)
    test_path = os.path.join('./emojis', 'Test_{}'.format(emoji_type))

    train_dataset = datasets.ImageFolder(train_path, transform)
    test_dataset = datasets.ImageFolder(test_path, transform)

    train_dloader = DataLoader(dataset=train_dataset, batch_size=opts.batch_size, shuffle=True, num_workers=opts.num_workers)
    test_dloader = DataLoader(dataset=test_dataset, batch_size=opts.batch_size, shuffle=False, num_workers=opts.num_workers)

Instead of using png, my dataset includes numpy array files. Could you suggest any data loader can work on the numpy array files?

ptrblck · February 11, 2019, 2:48pm

You could write your custom Dataset as described in the Data loading tutorial.
Basically, you would provide the image paths in the __init__ method, while loading and transforming each sample in __getitem__.

John1231983 · February 11, 2019, 2:49pm

So, pytorch does not support numpy format for dataloader. Am I right? Because I want to ultilize the transform augmentation in dataloader

ptrblck · February 11, 2019, 2:52pm

torchvision uses PIL as the backbone for loading and transforming images, so you would need to cast your numpy images to PIL.Images to apply the torchvision.transforms.

Alternatively, you could use some OpenCV methods to augment your numpy data, if you don’t want to convert them to PIL.

John1231983 · February 11, 2019, 3:57pm

Hi, I have wrote a customer dataloader for numpy. It worked without using transforms. However,I added transforms and it got the error. Could you please check help me

class NumpyDataset(data.Dataset):

    def __init__(self, root_path, transforms):
        self.data_numpy_list = [x for x in glob.glob(os.path.join(root_path, '*.npy'))]
        self.transforms = transforms
        self.data_list = []
        for ind in range(len(self.data_numpy_list)):
            data_slice_file_name = self.data_numpy_list[ind]
            data_i = np.load(data_slice_file_name)
            self.data_list.append(data_i)

    def __getitem__(self, index):

        self.data = np.asarray(self.data_list[index])
        self.data = np.stack((self.data, self.data, self.data)) # gray to rgb 64x64 to 3x64x64
        if self.transforms:
            self.data = self.transforms(self.data)
        return torch.from_numpy(self.data).float()

    def __len__(self):
        return len(self.data_numpy_list)

from torchvision import transforms
import torch.utils.data as dataloader
data_train = NumpyDataset("numpy_folder", transforms=transforms)
trainloader = dataloader.DataLoader(data_train, batch_size=1, shuffle=True)

The error is

self.data = self.transforms(self.data)
TypeError: 'module' object is not callable

ptrblck · February 11, 2019, 4:50pm

Currently you are just passing the module transforms to your Dataset. Instead you should specify, which transformation you would like to be applied on your data, e.g. transforms.Resize(224). You can find the transformations in the docs. If you find some interesting image transformations, you would have to cast your numpy array to a PIL.Image before applying them.