DataLoader returns the batch as a list

zqma · November 3, 2019, 11:23pm

When applying the trained model to the actual data, my dataset only has X but labels. I created a dataset class and then feed the dataset to a dataloader. However, the iter on the dataloader returns a list with the batch tensor as the only entry. I expect the return to be a tensor. Where did I do wrong? Below is the sample code.

class MyDataset(Dataset):
    def __init__(self, dataset):
        self.dataset = dataset
    def __getitem__(self,index):
        data = self.dataset[index]
        return data
    def __len__(self):
        return len(self.dataset)
data_tensor = torch.tensor([[1, 2, 3], [1, 2, 3], [2, 3, 1], [1, 2, 3]])
data_set = MyDataset(TensorDataset(data_tensor))
data_loader = torch.utils.data.DataLoader(data_set, batch_size=2, shuffle=False)
next(iter(data_loader))

Here is the result

[tensor([[1, 2, 3],
         [1, 2, 3]])]

I tested adding a label seq to the dataset class with the X by using TensorDataset(X, y) and then the return is a batch tensor as expected. Wondering what’s the best practice to create the dataloader when the data don’t contain labels.

I found someone has the same question on stackoverflow

JuanFMontesinos · November 4, 2019, 12:21am

Check sourcecode

[docs]class TensorDataset(Dataset):
    r"""Dataset wrapping tensors.

    Each sample will be retrieved by indexing tensors along the first dimension.

    Arguments:
        *tensors (Tensor): tensors that have the same size of the first dimension.
    """

    def __init__(self, *tensors):
        assert all(tensors[0].size(0) == tensor.size(0) for tensor in tensors)
        self.tensors = tensors

    def __getitem__(self, index):
        return tuple(tensor[index] for tensor in self.tensors)

    def __len__(self):
        return self.tensors[0].size(0)

Why do you wrap tensors with a tensordataset?
I would say this is expected as getitem returns tuples

zqma · November 6, 2019, 9:49pm

thank you!
i followed some tutorial where the tensor was wrapped in a dataset. Using the tensor directly in the dataloader works.

Nils_Smitham86 · March 21, 2024, 2:09pm

use your collect_fn in dataloader to return Tensor(of tensor), instead or list of tensor
look at Looping over DataLoader Returns List? - #3 by Imahn