Why is the accuracy difference so much when I use the image data set and pytorch's own data set directly?

ma3252788 · August 11, 2020, 6:01am

For example, for the cifar10 data set, directly using the data set that comes with pytorch, the accuracy rate can reach 96% under the same network structure, but after I converted cifar10 into a picture, I tested it and the accuracy rate was only 92%. why?

This is the previous code：

train_dataset = dset.CIFAR10(args.data_path, train=True, transform=train_transform, download=True)
test_dataset = dset.CIFAR10(args.data_path, train=False, transform=test_transform, download=True)

This is the modified code：

train_dataset = datasets.ImageFolder(root='/home/ubuntu/bigdisk/DataSets/cifar10/static/orig/train/',
                                         transform=train_transform
                                         )
    test_dataset = datasets.ImageFolder(root='/home/ubuntu/bigdisk/DataSets/cifar10/static/orig/test/',
                                        transform=test_transform
                                        )

train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=args.batch_size, shuffle=True,
                                               num_workers=args.prefetch, pin_memory=True)
    test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=args.test_bs, shuffle=False,
                                              num_workers=args.prefetch, pin_memory=True)

ptrblck · August 12, 2020, 4:12am

Is this behavior reproducible, i.e. are you seeing a better performance using the torchvision dataset if you rerun the code multiple times with a different seed?

If so, are you using the “exact” same dataset (same number of images are might some be missing.corrupt), the same transformation, and the same model?

ma3252788 · August 12, 2020, 6:37am

Thank you for your reply! As shown in the code above, I just modified the path to load the data set from ~/DATASETS/cifar.python to root='/home/ubuntu/bigdisk/DataSets/cifar10/static/orig/train/, Others such as train_transform have not changed. Only the path has changed. Using cifar.python, it can reach more than 96% each time, while using the img path converted by cifar.python, the accuracy rate is only 92% each time.
this is the img :

train for 50000, test for 10000

ptrblck · August 12, 2020, 8:54am

Just to understand the use case correctly:
you are not using the torchvision.dataset.CIFAR10 data, but downloaded the cifar.python dataset from another source.
After changing the path, the accuracy of your model is constantly lower using exactly the same setup and different seeds?

ma3252788 · August 13, 2020, 5:24am

Yes, the accuracy will be reduced, even if I use exactly the same parameters. In addition, I exported these imgs from cifar.python. Normally, the results should be the same, right?

ma3252788 · August 13, 2020, 5:30am

In addition, I would like to add that my data set is torchvision.datasets.CIFAR10(args.data_path, train=True, transform=train_transform, download=True), cifar.python is only automatically downloaded. So, this should be the official data set in torchivision.

ptrblck · August 13, 2020, 7:53am

The torchvision.dataset.CIFAR10 downloads binary files, while it seems your dataset has folders which contain images. Did you create these folders manually from the torchvision data?
Anyway, this wouldn’t explain the issue you are seeing.
How many times did you rerun the training and could you post a code snippet to reproduce this issue?