tf.data.Dataset, a dataset can be created from
([a, b, c], [d, e, f]), in which case the tuples
(b, e) and
(c, f) will be issued when reading it.
The main interest is that when calling
.batch(X), each dimension is batched separately, which allows easy preprocessing vectorization.
Now, one may think hey, what about doing this with 2 DataPipes? - well, if the in-memory structure you are reading examples + labels from only returns both at a time, the only efficient (high performance) way of building the pipeline is to have a single one that does what TF do.
I am very new to TorchText but I felt like this was not possible, so I am asking here
Thanks in advance for the answers!