I am working with multiple files, and multiple training samples in each file. I will use
ConcatDataset as described here:
I need to have negative samples in addition to my true samples, and I need my negative samples to be randomly selected from all the training data files. So, I am wondering, would the returned batch samples just be a random consecutive chuck from a random single file, or would be batch span across multiple random indexes across all the datafiles?
If there are more details needed about what I am trying to do exactly, it’s because I am trying to train over a TPU with Pytorch XLA.
Normally for negative samples, I would just use a 2nd
DataLoader, however, I am trying to train over TPUs with Pytorch XLA (alpha was just released a few days ago https://github.com/pytorch/xla ), and to do that I need to send my
DataLoader to a
torch_xla.distributed.data_parallel.DataParallel object, like
model_parallel(train_loop_fn, train_loader) which can be seen in these example notebooks
So, I am now limited to a single
DataLoader, which will need to handle both the true samples, and negative samples that need to be randomly selected from all my files.