I am following the ImageFolder+DataLoader tutorial to load data and assign labels. The code is:
normalize = T.Normalize(mean=[0.4, 0.4, 0.4], std=[0.2, 0.2, 0.2])
transform = T.Compose([
T.RandomResizedCrop(224),
T.RandomHorizontalFlip(),
T.ToTensor(),
normalize,
])
dataset = ImageFolder('data/dogcat/', transform=transform)
dataloader = DataLoader(dataset, batch_size=3, sampler=sampler, num_workers=0, drop_last=False)
It works perfect when there are two folders under the targeted dir like this:
data/dogcat/
|-- cat
| |-- cat.12484.jpg
| |-- cat.12485.jpg
| |-- cat.12486.jpg
| `-- cat.12487.jpg
`-- dog
|-- dog.12496.jpg
|-- dog.12497.jpg
|-- dog.12498.jpg
`-- dog.12499.jpg
Now, I collect more dog/cat images and put them in this way:
data/dogcat/
|-- cat
| |-- cat.12484.jpg
| |-- cat.12485.jpg
| |-- cat.12486.jpg
| `-- cat.12487.jpg
`-- dog
|-- dog.12496.jpg
|-- dog.12497.jpg
|-- dog.12498.jpg
`-- dog.12499.jpg
|-- newcat
| |-- newcat.12484.jpg
| |-- newcat.12485.jpg
`-- newdog
|-- newdog.12496.jpg
|-- newdog.12497.jpg
May I ask:
- Whether there is a quick way I can easily switch between different loading strategies, such as loading (cat vs “dog+newdog” to model), or loading (“cat+newcat” vs “dog+newdog” to model)
- I think the newdog and newcat have higher quality but fewer numbers. Is there a way I can assign higher weights to them when trying different loading ideas as 1 shows? The difficult I am facing is, such overweighting is not on the entire dog or cat but only on the new subclass. I find it hard to implement using sampler in DataLoader as: Balanced Sampling between classes with torchvision DataLoader
Thanks so much!