Change labels in Data Loader


(Shangeth Rajaa) #1

I have a data set of images, labels . I took a subset of it and want to change the labels of the whole subset to a single label.

eg: MNIST 0,1,2,3,4,5,6,7,8,9 ; lets say i want labels of 5,6,7,8,9 be 5. so final data labels be 0,1,2,3,4,5.

How to do it?


Create binary class dataset from multiple class dataset
#2

You could set the new value using a condition on your targets:

dataset = datasets.MNIST(
    root='PATH',
    transform=transforms.ToTensor()
)

dataset.targets[dataset.targets > 5] = 5
print(dataset.targets.unique())
> tensor([0, 1, 2, 3, 4, 5])

(Shangeth Rajaa) #3

AttributeError: ‘MNIST’ object has no attribute ‘targets’


#4

In older torchvision versions, you had to use train_labels or test_labels depending if the train argument was set to True or False, respectively.


(Boming(Tony) Zhang) #5

Hi, I am wondering is there a way to access the targets attributes for dataset which is imported by ImageFolder? I have a training set with 6 classes: building, forest, sea, street, glacier, and mountain. I only want to preserve the forest class label and mark the rest to unforest. I tried this:

dataset.targets[dataset.targets != 1] = 0

which didn’t work. Because it said it doesn’t have attribue targets


#6

Maybe you are using an older version.
Could you update torchvision and check that attribute again?
Also note that dataset.targets is a Python list in ImageFolder, so this indexing won’t work and you should cast it to a tensor before:

dataset = datasets.ImageFolder(root='PATH')
dataset.targets = torch.tensor(dataset.targets)
dataset.targets[dataset.targets==0] = 1

(Boming(Tony) Zhang) #7

Thanks for replying to me. I used conda update torchvision and my torchvision version is 0.2.1 now. It’s still not working. Is that the latest?


#8

Could be. I’m usually just install torchvision from source, as it is really easy and gives you all the new features.
You would have to clone the repo and just run python setup.py install as described here.


(Boming(Tony) Zhang) #9

Thanks! I installed from source and it’s working now! Any idea why the pip and conda distributions don’t have 0.2.3 right now?


(Boming(Tony) Zhang) #10

Hi,I have a follow-up question on that. I successfully convert the target’s attributes. But when I load the data with dataloader it still preserves the original targets.

train_set_two = datasets.ImageFolder(train_dir, transform=transform)
train_set_two.targets = torch.tensor(train_set_two.targets)
test_set_two.targets = torch.tensor(test_set_two.targets)
train_set_two.targets[train_set_two.targets > 1] = 1
test_set_two.targets[test_set_two.targets > 1] = 1

# prepare data loaders (combine dataset and sampler)
train_loader = torch.utils.data.DataLoader(train_set_two, batch_size=batch_size,
    sampler=train_sampler, num_workers=num_workers)

After this chunk of codes, I run visualized my train_loader, which still has 6 classes instead of 2. Any idea?


#11

Oh, right. Internally, ImageFolder seems to call dataset.samples, so you could try the following code:

train_set_two.samples = [(d, 1) if s > 1 else (d, s) for d, s in train_set_two.samples]