How to see distribution of labels in dataset

I would like to see how many of each label is in this random subset. When I type this it gives me an error:

print(dict(Counter(train_dataset_subset.targets)))

Below is my code:

# import the required modules
import torch
import torchvision
from torchvision.datasets import CIFAR10
from collections import Counter

trainset = CIFAR10(root='./data', train=True, download=True, transform=torchvision.transforms.ToTensor())

subset_size = 3000
train_dataset_subset = torch.utils.data.random_split(trainset, [subset_size, len(trainset)-subset_size])[0]

How do I go about solving this? Is there a way to use access trainset.targets when it is wrapped in torch.utils.data.random_split?

1 Like

Yes, you could access the internal .dataset attribute:

Counter(train_dataset_subset.dataset.targets)
1 Like

When I try that it gives me the following:

Counter({0: 5000, 1:5000, 2 : 5000, … , 9 : 5000})

However I need the train_dataset_subset to return something like this:

Counter ({ 0: 300, 1: 300, 2: 300, … , 9: 300})

The reason it is 300, is because it is 3000 split over the 10 labels. How would I go about doing this? I specifically want the subset, not the original dataset.

You won’t be able to directly access it and would need to iterate the Subset since the samples will be created from the passed indices while the samples are drawn.