CelebA dataset download errors

Dear Sir/Madam,

I am having issues downloading the CelebA dataset. It appears that some of the data is not in .zip format which is throwing an error in celeba.py:

3098it [00:00, 4697741.79it/s]
3098it [00:00, 930932.35it/s]
3103it [00:00, 943931.34it/s]
3098it [00:00, 913138.00it/s]
3098it [00:00, 3096009.96it/s]
3098it [00:00, 803484.65it/s]
Traceback (most recent call last):
  File "train_vqvae.py", line 101, in <module>
    dataset = CelebA(root_path, split='all', transform=transform, download=True)
  File "/vol/bitbucket/hgc19/env/lib/python3.6/site-packages/torchvision/datasets/celeba.py", line 66, in __init__
    self.download()
  File "/vol/bitbucket/hgc19/env/lib/python3.6/site-packages/torchvision/datasets/celeba.py", line 120, in download
    with zipfile.ZipFile(os.path.join(self.root, self.base_folder, "img_align_celeba.zip"), "r") as f:
  File "/usr/lib/python3.6/zipfile.py", line 1131, in __init__
    self._RealGetContents()
  File "/usr/lib/python3.6/zipfile.py", line 1198, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

Any help would be greatly appreciated.

Thanks in advance.

Im not totally sure, but it tries to download the files from Google Drive. Some of the files it can download, if they are small enough (like list_bbox_celeba.txt with the link https://docs.google.com/u/0/uc?id=0B7EVK8r0v71pbThiMVRxWXZ4dU0), but some files are to big and you get a extra warning page (e.g. for the img_align_celeba.zip https://docs.google.com/u/0/uc?id=0B7EVK8r0v71pZjFTYXZWM3FlRnM).

I recommend to download the dataset manully from google drive https://drive.google.com/drive/folders/0B7EVK8r0v71pWEZsZE9oNnFzTm8 and use this download folder as the root for the torchvision.datasets.CelebA class call!

2 Likes

Thank you for your speedy reply. That worked great. In the end I used datasets.ImageFolder and utils.data.Dataloader.

Cheers