jS5t3r
(Peter Lorenz)
March 24, 2022, 9:18pm
1
I selected some classes by doing soft links into a val
folder for the validation set.
class="n02841315" # binoculars
ln -s /home/from/ImageNet/val/"$class" /home/to/ImageNetHierarchy/val/"$class"
I use this dataloader:
import torch
import torchvision.transforms as transforms
import torchvision.datasets as datasets
import torchvision.models as models
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
transform = transforms.Compose(
transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
normalize,
)
dataset_dir_path = "/home/to/ImageNetHierarchy/val/"
data_loader = torch.utils.data.DataLoader(datasets.ImageFolder(dataset_dir_path, transform), batch_size=64, shuffle=True, num_workers=num_workers, pin_memory=True)
When I try to evaluate a pretrained model such as
import torchvision
model = torchvision.models.wide_resnet50_2(pretrained=True)
model.eval()
model.cuda()
for images, labels in data_loader :
images = images.cuda()
labels = labels.cuda()
outputs = model(images)
_, predicted = torch.max(outputs.data, 1)
if (predicted == labels):
print("this hardly happens!")
This model hardly predicts the correct class.
ImageFolder
creates the class indices (i.e. the targets) based on the available folders.
If I understand your use case correctly, you’ve only slimmed down the validation datasets while the training still has all 1000 classes?
In that case you are corrupting the class correspondence as the mapping could look like (using random class names):
# train
apple - class0
bird - class1
duck - class2
eagle - class3
wolf - class4
# val
bird - class0
eagle - class1
If you want to manipulate the validation dataset I would recommend to create a custom Dataset
and make sure to provide the expected class labels to the remaining folders.
1 Like
jS5t3r
(Peter Lorenz)
March 25, 2022, 8:34am
3
yes, the training is still 1000 classes. The validation set has much less and that’s why it is corrupted.
What is the best way to create such a dataloader?
You can create your custom Dataset
that returns the expected value corresponding to the original 1000 classes
import torchvision
class MyImageFolder(torchvision.datasets.ImageFolder):
def __init__(self, img_path, transform=None):
super(MyImageFolder, self).__init__(img_path, transform)
self.classes, self.class_to_idx = self._my_classes()
self.samples = self._make_dataset(self.samples)
self.imgs = self.samples
self.targets = [s[1] for s in self.samples]
def _my_classes(self):
classes = ['duck', 'wolf']
class_to_idx = {classes[i]: i for i in range(len(classes))}
return classes, class_to_idx
def _make_dataset(self, samples):
n = len(samples)
ds = [None] * n
for i, (img, cls) in enumerate(samples):
ds[i] = (img, self._custom_class(cls))
return ds
def _custom_class(self, cls):
if cls == 0:
return self.classes[0]
if cls == 1:
return self.classes[1]
else:
return 'not_my_favorite_class'
This would be a slight variation to the answer given here.
The internal classes and class_to_idx attributes are used in DatasetFolder's __init__ to create the samples as seen in these lines of code .
After ImageFolder was initialized (DatasetFolder is the parent class of it), changing these values won’t have any effect on the samples anymore, so you might want to derive your custom class from DatasetFolder and change these attributes in the __init__ method.
Hope this helps
1 Like