Enumerate/iter for multiple dataloaders

I am trying to divide MNIST into separate subsets where each subset is a different class.

train_dataset = datasets.MNIST('/some/dir/', train=True, download=True, transform=self.dataset_transform)

for i in range(10):
    idx = train_dataset.targets==i
    train_dataset.targets[idx]                
    train_subsets.append(torch.utils.data.Subset(train_dataset, idx))
    train_loaders.append(torch.utils.data.DataLoader(train_subsets[i], batch_size=self.batch_size, shuffle=True))
    train_iters.append(iter(train_loaders[i]))

Beginning of my training loop:

for label_id in range(10):
    data, target = train_iters[label_id].next()

When training, I get a ValueError:

only one element tensors can be converted to Python scalars

What is going on here?

Hello,

In my shallow opinion, there might be something wrong with your snippet.

Here, idx is tensor with values 0 and 1. So if you pass it as indices argument to Subset, the subset will only contain the first and the second target (means position 0 and position 1).
And train_dataset.targets[idx] does not make sense to me.

For this error, you could try to print train_iters[lable_id] 's shape and debug it.

1 Like

Finally, I was able to make this work with:

for i in range(10):
    idx = train_dataset.targets==i
    idx_np = idx.numpy()
    idx_num = idx_np * range(len(idx))
    idx_no_zeros = list(filter(lambda a: a != 0, idx_num))
    train_subsets.append(torch.utils.data.Subset(train_dataset, idx_no_zeros))
    train_loaders.append(torch.utils.data.DataLoader(train_subsets[i], batch_size=self.batch_size, shuffle=True))