DataLoader is not working for labels


I have written one class to load my custom dataset. The class is as follows:

class ShapeDataset(Dataset):
	def __init__(self, root_dir, transform=None):
		self.img_names = glob.glob("training_data/*.jpg")
		self.img_names = [os.path.basename(i) for i in self.img_names]
		self.txt_names = [ i[:-4]+'.txt' for i in self.img_names ]
		self.root_dir = root_dir
		self.labels = generate_labels(self.txt_names, self.root_dir)
		self.transform = transform

	def __getitem__(self, index):
		self.img_name = os.path.join(self.root_dir, self.img_names[index])
		self.image =
		self.label = self.labels[index]
		if self.transform:
			self.image = self.transform(self.image)

		return (self.image, self.label)

	def __len__(self,):
		return len(self.img_names)

I tested it to make sure it is working:

ata = ShapeDataset("training_data")
img, l = data.__getitem__(1347)
print(l)  #output [2, 2, 1]

Here, the label is a list of 3 elements. Everything worked fine until I used DataLoader as follows and tried to see some data.

data = dataset.ShapeDataset("training_data", transform=transforms.Compose([transforms.ToTensor()]))
dataset_loader =, batch_size=10, shuffle=False)

for batch_idx, (data, target) in enumerate(dataset_loader):

In each iteration, the target should be a list of 10 lists. But I get something like this:

[('1', '1', '0', '3', '2', '1', '1', '2', '2', '2'), ('2', '0', '3', '1', '3', '1', '4', '1', '2', '3'), ('2', '0', '0', '1', '0', '1', '2', '1', '1', '1')]

Clearly, this is not the expected output. I don’t understand what went wrong. However, it is working fine for the images. In each iteration, it returns a matrix of size 10 * 3 * height * width. There is something wrong with labels. Any help will be appreciated.

Note: Generate_labels function does some preprocessing to get the labels in proper form. I think it does not matter because getitem returns label in expected dimensions (list of 3 elements).


I think that the dataloader will fetch samples(which are tensors) in a batch and put them in a tensor with the first dimension being the batch size. If you want to get a list of lists then you probably need to make your own collate function.
If you don’t care about having a list of lists then you can put the labels in a tensor. For example, wrapping your list in a tensor will work so instead of this:

def __getitem__(self, index):
    return [1.0, 2.0, 3.0]

you can use this:

def __getitem__(self, index):
    return torch.FloatTensor([1.0, 2.0, 3.0])

If you want to see how to make your own collate function then look here: