Customizing dataloader, data transform is not applied

(Wonchul Son) #1
# Customizing my dataloader
class MyDataset():
    def __init__(self, cropped_1x32_dataset, targets):
        for i in range(32):
            self.__setattr__('data_{}'.format(i), cropped_1x32_dataset[i])
        self.targets = targets
        
    def __getitem__(self, index):
        for i in range(32):
            globals()['data_{}'.format(i)] = self.__getattribute__('data_{}'.format(i))[index]
        y = self.targets[index]
        return [globals()['data_{}'.format(i)] for i in range(32)], y
    
    def __len__(self):
        return len(self.data_0)

# train
my_train_dataset = MyDataset(train_cropped_1x32_dataset, train_dataset.targets)
my_train_loader = torch.utils.data.DataLoader(dataset = my_train_dataset,
                                              batch_size = batch_size,
                                              shuffle = True,
                                              num_workers=4)
# main
def train(epoch):
    model.train()
    train_loss = 0
    total = 0
    correct = 0
    
    for batch_idx, (cropped_1x32_dataset, target) in enumerate(my_train_loader):
        for i in range(32):   
            cropped_1x32_dataset[i] = cropped_1x32_dataset[i].to(device)
        target = target.to(device)

        optimizer.zero_grad()
        output = model(cropped_1x32_dataset)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()

In main, I checked the data.
But it doesn’t applied any transforms both RandomHorizontalFlip and Normalize.
How can I fix this?

Thank you.

(Juan F Montesinos) #2

Hi,
If you create a custom dataloader you have to apply those transformations.

# Customizing my dataloader
class MyDataset():
    def __init__(self, cropped_1x32_dataset, targets):
        for i in range(32):
            self.__setattr__('data_{}'.format(i), cropped_1x32_dataset[i])
        self.targets = targets
        
    def __getitem__(self, index):
        for i in range(32):
            globals()['data_{}'.format(i)] = self.__getattribute__('data_{}'.format(i))[index]
        y = self.targets[index]
        return [globals()['data_{}'.format(i)] for i in range(32)], y
    
    def __len__(self):
        return len(self.data_0)

Look at your code, you are calling transformations nowhere.

(Wonchul Son) #3

@JuanFMontesinos
Should I call transformations in __init__?
When I load data, I called transformations like under.
Let me know where should I edit the code :smiley:
Thank you.

# load train data
transform_train = transforms.Compose([
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(mean=(0.4914, 0.4824, 0.4467),
                             std=(0.2471, 0.2436, 0.2616))
])

train_dataset = datasets.CIFAR10(root='../../data/cifar10/',
                                 train=True,
                                 transform=transform_train,
                                 download=True)
(Juan F Montesinos) #4

If I properly understood, you replaced CIFAR10 dataset by your own by loading them from CIFAR10 dataset right?

The point is that you are getting the data somehow from the dataset directory rather than iterating the dataset. My guess is that CIFAR10 call function is never run, consequently, transforms are never applied.
If you want to precompute transforms I would recommend to iterate over dataset instead of getting data like this train_dataset.data[i]
If you want to apply transforms on-the-fly you should apply them here

    def __getitem__(self, index):
        for i in range(32):
            globals()['data_{}'.format(i)] = self.__getattribute__('data_{}'.format(i))[index]
        y = self.targets[index] 
        return [globals()['data_{}'.format(i)] for i in range(32)], y
(Wonchul Son) #5

@JuanFMontesinos

Thank you :smiley: