Using an AWS S3 bucket dataset with Torchvision.Datasets.ImageFolder

HugoH · June 13, 2025, 10:21pm

Hi all, I’m trying to load a large-scale dataset (imagery data) from an S3 bucket into my Sagemaker Jupyterlab instance to later start training it. However, when dealing with the data augmentations/variations using torchvisions datasets.ImageFolder there is a problem since ImageFolder does not accept lists nor S3MapDataset, it just accepts os.PathLike type.

I have been trying to address this problem using the S3 Plugin and the S3 connector for Pytorch but they load the data from S3 as other types rather than os.PathLike.

There have been similar questions on this platform: Can I use torchvision Dataset and Dataloader with AWS S3?, but seems there is still not a clear answer.

Is there any update as for June 2025 on this issue? Do you know any workaround that could solve this problem?