Dear all,
I have two ways of loading the CIFAR10 dataset. The first one is quite traditional:
trainset = torchvision.datasets.CIFAR10(root='DATA', train=True, download=True, transform=transform_train)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=128, shuffle=True, num_workers=2)
testset = torchvision.datasets.CIFAR10(root='DATA', train=False, download=True, transform=transform_test)
testloader = torch.utils.data.DataLoader(testset, batch_size=100, shuffle=False, num_workers=2)
For the second one, I’m first loading the datasets with:
trainset = torchvision.datasets.CIFAR10(root='DATA', train=True, download=True, transform=transform_train)
testset = torchvision.datasets.CIFAR10(root='DATA', train=False, download=True, transform=transform_test)
Then I’m iterating over the trainset and testset to apply some transformations to the labels (I avoid such transformations right now, and I still have different behaviors):
for i in range(len(trainset)):
img, target = trainset[i]
train_set.append(np.array(img))
% Do something on the labels OR NOT
train_lab.append(target)
Finally, I transform this back to a dataloader with:
train_set = np.asarray(train_set, dtype="float32")
tensor_train = torch.from_numpy(train_set).float()
ptrain = torch.utils.data.TensorDataset(tensor_train, tensor_lab_train)
trainloader = torch.utils.data.DataLoader(ptrain, batch_size=128, shuffle=True)
Then I train my model by iterating over the loader with
for batch_idx, (inputs, targets) in enumerate(trainloader):
NONETHELESS, I have two very different behaviors … In the first case, every epoch takes 15s, while it’s only 3s for the second case. Then, at epoch 30 I can observe a train accuracy of 89% for the first loading, and 99% for the second … Also, the test accuracy of the second method scales very poorly (blocked at 70%). Am I missing something ? What is a proper way of modifying the labels of a DataLoader ?
Thanks you very much for your help !