iNaturalist download crashes towards the end

INaturalist download proceeds fine up to the end, then crashes with this error:

File "/scratch/user/myenvs/env/lib/python3.8/site-packages/torchvision/datasets/utils.py", line 152, in download_url
    raise RuntimeError("File not found or corrupted.")

Using torchvision automatic downloading of the 2021_train dataset.

What version of the library are you running? You may want to upgrade to the latest version.

The error is raised when the integrity check fails. You can re-try while setting download=True to see if it works or if you get another error message.

torchvision==0.11.3

And download=True is enabled :confused:

It gets through most of the download and then fails a the end. I tested it across multiple systems even.

Can you verify the files are at the root location and intact? If not, you can remove them and re-download them again?

Well, there’s a INaturalist_Train/2021_train.tgz. But I deleted it and tried re-downloading, and the same error occurred at the end of the download. When I try running, I get:

  File "/u/slerman/miniconda3/envs/env/lib/python3.8/site-packages/torchvision/datasets/utils.py", line 152, in download_url
    raise RuntimeError("File not found or corrupted.")

Thanks for the report @Sam_Lerman. I’ve opened an issue here to track it.

I downloaded the archive and computed the MD5 checksum:

e0526d53c7f7b2e3167b2b43bb2690ed

@Sam_Lerman can you confirm that? If yes, the one torchvision has on record is wrong and needs to be fixed.

What do I need to do to confirm it?

Yes, the output is e0526d53c7f7b2e3167b2b43bb2690ed.

This was fixed in fix INaturalist 2021_train checksum by pmeier · Pull Request #5844 · pytorch/vision · GitHub. If you are using nightly builds, the fix will be available later today. Otherwise, you can follow the instructions in INaturalist download is broken for version="2021_train" · Issue #5817 · pytorch/vision · GitHub for a workaround until the next release.