I have two datasets but different labels alignments. How do I allign labels?

TanmDL · August 14, 2020, 9:21pm

I have two datasets like A dataset:{feature, label_A} and B dataset: {data, labels_B} where label_A and label_B are not aligned.

I want to write a custom dataset from these two datasets which will pass to dataloader and give outputs like feature[index], data[index], align labels (A), and index.

Please help me.

TanmDL · August 14, 2020, 9:22pm

@ptrblck please look into this problem.

ptrblck · August 15, 2020, 6:05am

Could you explain a bit, how these labels could be aligned?
If you have a lookup table for the corresponding indices between label_A and label_B, you could use this map to load the samples from both Datasets inside your custom Dataset.

TanmDL · August 15, 2020, 9:30am

Thank you for your kind reply…For demo purpose I am adding a demo code.

class my_custom_dataset(torch.utils.data.Dataset):

  def __init__(self,data=None):
    self.data = data
    
    self.features_data = torch.from_numpy(np.random.randn(70000, 2048)).float()
    self.labels_data = torch.from_numpy(np.random.randint(0,10,(70000,))).long()

    
  def __getitem__(self, index):
    data = self.features_data[index]
    target = self.labels_data[index]
    tr_data, tr_target = self.data[index]
        
    return data,  target, tr_data, tr_target

  def __len__(self):
      
    return len(self.features_data)

train_dataset = torchvision.datasets.CIFAR10(root=’./data’, train=True, download=True, transform=transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]))
test_dataset = torchvision.datasets.CIFAR10(root=’./data’, train=False, download=True, transform=transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]))

total_dataset = train_dataset + test_dataset

custom_set = my_custom_dataset(data= total_dataset)

custom_loader = torch.utils.data.DataLoader(custom_set, batch_size=128, shuffle=False, num_workers=2)

data_, label, tr_data, tr_target = iter(custom_loader).next()
print(label[:20])
print(tr_target[:20])

print(data_.size())
print(tr_data.size())

I have sent a demo code above. Please check the code.

ptrblck · August 16, 2020, 2:33am

Thanks for the code. I still don’t understand where exactly the issue is.
Are you seeing unexpected results using the posted approach?