Change labels in Data Loader

I have a data set of images, labels . I took a subset of it and want to change the labels of the whole subset to a single label.

eg: MNIST 0,1,2,3,4,5,6,7,8,9 ; lets say i want labels of 5,6,7,8,9 be 5. so final data labels be 0,1,2,3,4,5.

How to do it?

You could set the new value using a condition on your targets:

dataset = datasets.MNIST(
    root='PATH',
    transform=transforms.ToTensor()
)

dataset.targets[dataset.targets > 5] = 5
print(dataset.targets.unique())
> tensor([0, 1, 2, 3, 4, 5])
1 Like

AttributeError: ‘MNIST’ object has no attribute ‘targets’

In older torchvision versions, you had to use train_labels or test_labels depending if the train argument was set to True or False, respectively.

Hi, I am wondering is there a way to access the targets attributes for dataset which is imported by ImageFolder? I have a training set with 6 classes: building, forest, sea, street, glacier, and mountain. I only want to preserve the forest class label and mark the rest to unforest. I tried this:

dataset.targets[dataset.targets != 1] = 0

which didn’t work. Because it said it doesn’t have attribue targets

Maybe you are using an older version.
Could you update torchvision and check that attribute again?
Also note that dataset.targets is a Python list in ImageFolder, so this indexing won’t work and you should cast it to a tensor before:

dataset = datasets.ImageFolder(root='PATH')
dataset.targets = torch.tensor(dataset.targets)
dataset.targets[dataset.targets==0] = 1
1 Like

Thanks for replying to me. I used conda update torchvision and my torchvision version is 0.2.1 now. It’s still not working. Is that the latest?

Could be. I’m usually just install torchvision from source, as it is really easy and gives you all the new features.
You would have to clone the repo and just run python setup.py install as described here.

Thanks! I installed from source and it’s working now! Any idea why the pip and conda distributions don’t have 0.2.3 right now?

Hi,I have a follow-up question on that. I successfully convert the target’s attributes. But when I load the data with dataloader it still preserves the original targets.

train_set_two = datasets.ImageFolder(train_dir, transform=transform)
train_set_two.targets = torch.tensor(train_set_two.targets)
test_set_two.targets = torch.tensor(test_set_two.targets)
train_set_two.targets[train_set_two.targets > 1] = 1
test_set_two.targets[test_set_two.targets > 1] = 1

# prepare data loaders (combine dataset and sampler)
train_loader = torch.utils.data.DataLoader(train_set_two, batch_size=batch_size,
    sampler=train_sampler, num_workers=num_workers)

After this chunk of codes, I run visualized my train_loader, which still has 6 classes instead of 2. Any idea?

Oh, right. Internally, ImageFolder seems to call dataset.samples, so you could try the following code:

train_set_two.samples = [(d, 1) if s > 1 else (d, s) for d, s in train_set_two.samples]
2 Likes

Hi. I have a customized dataset. the extension is .pth. How do I select the labels from the dataset?

How did you store the dataset and what does the pth file contain?

Hi, this was really useful to me changing the labels in EMNIST from 1-26 to 0-25.

However, do you have a link to .targets in the pytotch documentation? The word ‘target’ appears so much there that I can’t search for it successfully.

I assume you are referring to the first code snippet using the MNIST dataset?
If so, you can find the targets definition here.

Let me know, if I misunderstood the question.

1 Like
TypeError: '>' not supported between instances of 'list' and 'int'

My code:

trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True)
trainset.targets[trainset.targets > 5] = 5

You could either try to transform the targets list to a tensor via:

trainset.targets = torch.tensor(trainset.targets)

and check if this would break anything else or you could create a custom Dataset and manipulate the targets explicitly internally.

I am also trying something similar.

import os
import matplotlib.pyplot as plt
import numpy as np

import torch
import torchvision.datasets as datasets
import torchvision.transforms as transforms
from torch.utils.data import Subset, DataLoader

def LoadCIFAR10_py():
     strDirPath = os.getcwd() + "\\DATA\\"
    aTrainData = datasets.CIFAR10(strDirPath, train=True, download=True)
    aTestData = datasets.CIFAR10(strDirPath, train=False, download=True)
    print("DATASET: Train=", len(aTrainData), " Test=", len(aTestData))
    #print(type(aTrainData), aTrainData)
    #print(aTestData[0])
    
    return aTrainData, aTestData

def CreateBinarySubset(aData, nLabel1, nLabel2):
    # We create a tensor that has `True` at an index if the sample belongs to class 1
    idxLabel1 = torch.tensor(aData.targets) == nLabel1
    # Similarly, this tensor has `True` at an index if the sample belongs to class 8
    idxLabel2 = torch.tensor(aData.targets) == nLabel2
    # print(idxLabel1.shape, idxLabel2.shape)

    # Merge these two so that we have one Boolean tensor that has True at the index 
    # where the sample is of class Auto or Truck, and False otherwise.
    index_mask = idxLabel1 | idxLabel2
    data_indices = index_mask.nonzero().reshape(-1)
    oDataSubset = Subset(aData, data_indices)
    oDataloader = DataLoader(oDataSubset, shuffle=False, batch_size=8, num_workers=2)
    return oDataloader, oDataSubset 

if __name__ == "__main__":
    ## MAIN
    aTrainData, aTestData = LoadCIFAR10_py()

    ### Here is extracted labels 1 and 9
    oTrainDataloader, oTrainDataSubset = CreateBinarySubset(aTrainData, 1, 9)
    print('Training subset: ', len(oTrainDataSubset))
    
     oTestDataloader, oTestDataSubset = CreateBinarySubset(aTestData, 1, 9)
     print('Testing subset: ', len(oTestDataSubset))

     # Problem:
     # Now i want to change the label 9 to label 0 
     # oTestDataSubset does not have target

     #this is not working
     aTrainData.targets[aTrainData.targets == 9] = 10 ## not working

please advice.

You can iterate the targets and manipulate them:

for idx, target in enumerate(oTestDataSubset.dataset.targets):
    if target == 9:
        oTestDataSubset.dataset.targets[idx] = 0
    
torch.tensor(oTestDataSubset.dataset.targets).unique(return_counts=True)
# (tensor([0, 1, 2, 3, 4, 5, 6, 7, 8]),
#  tensor([2000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000]))

Also, your code is quite hard to read as you haven’t formatted it. You can post code snippets by wrapping them into three backticks ```, which would make it easier to debug your issue.

1 Like

Thanks I will try this.