Integrating LEAF benchmark datasets to PyTorch

I’ve been trying simulate federated learning (FL) with Pytorch.
For FL, there’s a benchmark called Leaf which contains some datasets that are particularly suitable for non-iid data partition setting that arises in FL context.

Is there an easy way to integrate the datasets from that benchmark to PyTorch? As far as I can see, they’re currently not available via torchvision datasets.

If you want to preload the complete dataset, you could pass it to a TensorDataset.
On the other hand, if you are dealing with image data, which is stored as separate image files in folders corresponding to the classes, you could use ImageFolder.

Depending on the dataset and how each sample is stored, you might write a custom Dataset as described here.

1 Like