Change labels in Data Loader

shangeth · February 10, 2019, 2:07pm

I have a data set of images, labels . I took a subset of it and want to change the labels of the whole subset to a single label.

eg: MNIST 0,1,2,3,4,5,6,7,8,9 ; lets say i want labels of 5,6,7,8,9 be 5. so final data labels be 0,1,2,3,4,5.

How to do it?

ptrblck · February 10, 2019, 5:09pm

You could set the new value using a condition on your targets:

dataset = datasets.MNIST(
    root='PATH',
    transform=transforms.ToTensor()
)

dataset.targets[dataset.targets > 5] = 5
print(dataset.targets.unique())
> tensor([0, 1, 2, 3, 4, 5])

shangeth · February 11, 2019, 4:44am

AttributeError: ‘MNIST’ object has no attribute ‘targets’

ptrblck · February 11, 2019, 10:04am

In older torchvision versions, you had to use train_labels or test_labels depending if the train argument was set to True or False, respectively.

tuotuoZ · April 4, 2019, 12:06am

Hi, I am wondering is there a way to access the targets attributes for dataset which is imported by ImageFolder? I have a training set with 6 classes: building, forest, sea, street, glacier, and mountain. I only want to preserve the forest class label and mark the rest to unforest. I tried this:

dataset.targets[dataset.targets != 1] = 0

which didn’t work. Because it said it doesn’t have attribue targets

ptrblck · April 4, 2019, 12:15am

Maybe you are using an older version.
Could you update torchvision and check that attribute again?
Also note that dataset.targets is a Python list in ImageFolder, so this indexing won’t work and you should cast it to a tensor before:

dataset = datasets.ImageFolder(root='PATH')
dataset.targets = torch.tensor(dataset.targets)
dataset.targets[dataset.targets==0] = 1

tuotuoZ · April 4, 2019, 12:30am

Thanks for replying to me. I used conda update torchvision and my torchvision version is 0.2.1 now. It’s still not working. Is that the latest?

ptrblck · April 4, 2019, 9:43am

Could be. I’m usually just install torchvision from source, as it is really easy and gives you all the new features.
You would have to clone the repo and just run python setup.py install as described here.

tuotuoZ · April 4, 2019, 1:46pm

Thanks! I installed from source and it’s working now! Any idea why the pip and conda distributions don’t have 0.2.3 right now?

tuotuoZ · April 9, 2019, 6:54pm

Hi，I have a follow-up question on that. I successfully convert the target’s attributes. But when I load the data with dataloader it still preserves the original targets.

train_set_two = datasets.ImageFolder(train_dir, transform=transform)
train_set_two.targets = torch.tensor(train_set_two.targets)
test_set_two.targets = torch.tensor(test_set_two.targets)
train_set_two.targets[train_set_two.targets > 1] = 1
test_set_two.targets[test_set_two.targets > 1] = 1

# prepare data loaders (combine dataset and sampler)
train_loader = torch.utils.data.DataLoader(train_set_two, batch_size=batch_size,
    sampler=train_sampler, num_workers=num_workers)

After this chunk of codes, I run visualized my train_loader, which still has 6 classes instead of 2. Any idea?

ptrblck · April 9, 2019, 7:53pm

Oh, right. Internally, ImageFolder seems to call dataset.samples, so you could try the following code:

train_set_two.samples = [(d, 1) if s > 1 else (d, s) for d, s in train_set_two.samples]

Sam_Mertens · June 23, 2020, 6:10pm

Hi. I have a customized dataset. the extension is .pth. How do I select the labels from the dataset?

ptrblck · June 24, 2020, 6:07am

How did you store the dataset and what does the pth file contain?

Arthur_Conmy · July 15, 2020, 9:57pm

Hi, this was really useful to me changing the labels in EMNIST from 1-26 to 0-25.

However, do you have a link to .targets in the pytotch documentation? The word ‘target’ appears so much there that I can’t search for it successfully.

ptrblck · July 16, 2020, 12:59am

I assume you are referring to the first code snippet using the MNIST dataset?
If so, you can find the targets definition here.

Let me know, if I misunderstood the question.

kiasari · April 9, 2023, 4:43am

TypeError: '>' not supported between instances of 'list' and 'int'

My code:

trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True)
trainset.targets[trainset.targets > 5] = 5

ptrblck · April 9, 2023, 5:15am

You could either try to transform the targets list to a tensor via:

trainset.targets = torch.tensor(trainset.targets)

and check if this would break anything else or you could create a custom Dataset and manipulate the targets explicitly internally.

apachetechnology · February 10, 2024, 1:08pm

I am also trying something similar.

import os
import matplotlib.pyplot as plt
import numpy as np

import torch
import torchvision.datasets as datasets
import torchvision.transforms as transforms
from torch.utils.data import Subset, DataLoader

def LoadCIFAR10_py():
     strDirPath = os.getcwd() + "\\DATA\\"
    aTrainData = datasets.CIFAR10(strDirPath, train=True, download=True)
    aTestData = datasets.CIFAR10(strDirPath, train=False, download=True)
    print("DATASET: Train=", len(aTrainData), " Test=", len(aTestData))
    #print(type(aTrainData), aTrainData)
    #print(aTestData[0])
    
    return aTrainData, aTestData

def CreateBinarySubset(aData, nLabel1, nLabel2):
    # We create a tensor that has `True` at an index if the sample belongs to class 1
    idxLabel1 = torch.tensor(aData.targets) == nLabel1
    # Similarly, this tensor has `True` at an index if the sample belongs to class 8
    idxLabel2 = torch.tensor(aData.targets) == nLabel2
    # print(idxLabel1.shape, idxLabel2.shape)

    # Merge these two so that we have one Boolean tensor that has True at the index 
    # where the sample is of class Auto or Truck, and False otherwise.
    index_mask = idxLabel1 | idxLabel2
    data_indices = index_mask.nonzero().reshape(-1)
    oDataSubset = Subset(aData, data_indices)
    oDataloader = DataLoader(oDataSubset, shuffle=False, batch_size=8, num_workers=2)
    return oDataloader, oDataSubset 

if __name__ == "__main__":
    ## MAIN
    aTrainData, aTestData = LoadCIFAR10_py()

    ### Here is extracted labels 1 and 9
    oTrainDataloader, oTrainDataSubset = CreateBinarySubset(aTrainData, 1, 9)
    print('Training subset: ', len(oTrainDataSubset))
    
     oTestDataloader, oTestDataSubset = CreateBinarySubset(aTestData, 1, 9)
     print('Testing subset: ', len(oTestDataSubset))

     # Problem:
     # Now i want to change the label 9 to label 0 
     # oTestDataSubset does not have target

     #this is not working
     aTrainData.targets[aTrainData.targets == 9] = 10 ## not working

please advice.

ptrblck · February 10, 2024, 3:38pm

You can iterate the targets and manipulate them:

for idx, target in enumerate(oTestDataSubset.dataset.targets):
    if target == 9:
        oTestDataSubset.dataset.targets[idx] = 0
    
torch.tensor(oTestDataSubset.dataset.targets).unique(return_counts=True)
# (tensor([0, 1, 2, 3, 4, 5, 6, 7, 8]),
#  tensor([2000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000]))

Also, your code is quite hard to read as you haven’t formatted it. You can post code snippets by wrapping them into three backticks ```, which would make it easier to debug your issue.

apachetechnology · February 11, 2024, 9:32am

Thanks I will try this.