Difference between ConcatDataset & ChainDataset

Hi there,
The documentation for ConcatDataset & ChainDataset isn’t exactly clear (at least to me).

If I wanted to have a dataloader that creates batches of:

  1. A matrix
  2. A vector
  3. A label

Does ConcatDataset allow me to iterate over the matricies and the vector from 2 different datasets simulatenously? And the shuffle will return the corresponding matrix and vector?

Many Thanks

ChainDataset is used for IterableDatasets, while ConcatDataset is used for the map-style datasets.

No, as ConcatDataset will concatenate the passed datasets and won’t yield the samples simultaneously.
You could zip the DataLoaders and iterate them together:

dataset1 = TensorDataset(torch.zeros(10, 1), torch.zeros(10, 1))
dataset2 = TensorDataset(torch.ones(10, 1), torch.ones(10, 1))

loader1 = DataLoader(dataset1, num_workers=2, batch_size=2)
loader2 = DataLoader(dataset2, num_workers=2, batch_size=2)

for (x1, y1), (x2, y2) in zip(loader1, loader2):
    print(x1, y1)
    print(x2, y2)

Hi Piotr,
Many thanks for the thorough reply I have zipped 2 loaders together, and your continual replies all over the forum they’re a great help to a huge number of people.

1 Like