I want to implement parallel loading of two datasets using the torch.utils.data.DataLoader
. The caveat is the fact that the batch sizes do not necessarily match.
This is what I am currently doing:
dataset1 = datasets.MNIST(...)
dataset2 = datasets.SVHN(...)
loader1 = torch.utils.data.DataLoader(dataset1, batch_size=32, num_workers=4)
loader2 = torch.utils.data.DataLoader(dataset2, batch_size=64, num_workers=4)
for (x1, y1), (x2, y2) in zip(loader1, loader2):
pass
I get regular EOFError
and ConnectionResetError
messages when applying this scheme.
So I am wondering if there is another elegant way of properly implementing such a functionality using the existing DataLoader, maybe by a custom Dataset implementation.
Or maybe there is an even simpler approach I am currently overlooking.
I would appreciate any ideas!