Lets say I load ia dataset using ImageFolder
because my data is structured that way.
Now I pick k
indices of my choice and use torch.utils.data.Subset
to create a subset dataset.
Now, for these images I want to modify the targets
and give my own labels
.
How do I do it?
Currently, the error is this AttributeError: 'Subset' object has no attribute 'targets'
1 Like
ok. I solved this issue with the following.
Leaving it here for others
from torch.utils.data import Dataset
class my_subset(Dataset):
r"""
Subset of a dataset at specified indices.
Arguments:
dataset (Dataset): The whole Dataset
indices (sequence): Indices in the whole set selected for subset
labels(sequence) : targets as required for the indices. will be the same length as indices
"""
def __init__(self, dataset, indices,labels):
self.dataset = dataset
self.indices = indices
labels_hold = torch.ones(len(dataset)).type(torch.long) *300 #( some number not present in the #labels just to make sure
labels_hold[self.indices] = labels
self.labels = labels_hold
def __getitem__(self, idx):
image = self.dataset[self.indices[idx]][0]
label = self.labels[self.indices[idx]]
return (image, label)
def __len__(self):
return len(self.indices)
1 Like
Hey, could you explain what how you solved it?
I currently face this issue : I Read an image dataset using ImageFolder => Created train+val set using random_split()
=> tried to do train_set.classes
and got the same attribute error you got.
What does it mean?
1 Like
Thank you so much for sharing, worked like a charm.
Keeping the same terminology as pytorch and avoiding two level indexing one could instead use:
from torch.utils.data import Dataset
class custom_subset(Dataset):
r"""
Subset of a dataset at specified indices.
Arguments:
dataset (Dataset): The whole Dataset
indices (sequence): Indices in the whole set selected for subset
labels(sequence) : targets as required for the indices. will be the same length as indices
"""
def __init__(self, dataset, indices, labels):
self.dataset = torch.utils.data.Subset(dataset, indices)
self.targets = labels
def __getitem__(self, idx):
image = self.dataset[idx][0]
target = self.targets[idx]
return (image, target)
def __len__(self):
return len(self.targets)
Oh my goodness!
Thank you!
I’ve working with the problem for about 40 hours LOL!
1 Like
Maybe simply replacing the original attribute by the new attribute can be an effective way:
#load dataset
train_set = MyMNIST(root=self.root, train=True, transform=transform, download=False)
# subset training set
index_sub = np.random.choice(np.arange(len(train_set)), 10000, replace=False)
#replacing attribute
train_set.data = train_set.data[index_sub]
train_set.targets = train_set.targets[index_sub]
train_set.semi_targets = train_set.semi_targets[index_sub]