pinocchio
(Rene Sandoval)
March 24, 2020, 6:01pm
1
I want to download a dataset from a specific url to specific path.
I tried the following:
from torchvision.datasets.utils import download_and_extract_archive
## download mini-imagenet
url = 'https://drive.google.com/file/d/1rV3aj_hgfNTfCakffpPm7Vhpr1in87CR'
filename = 'miniImagenet.tgz'
root = '~/tmp/'
download_and_extract_archive(url, root, filename)
but it didn’t work.
Why? How do we fix it?
Error:
Traceback (most recent call last):
File "/Users/me/pytorch_playground.py", line 79, in <module>
download_mini_imagenet()
File "/Users/me/pytorch_playground.py", line 72, in download_mini_imagenet
download_and_extract_archive(url, root, filename)
File "/Users/me/lib/python3.7/site-packages/torchvision/datasets/utils.py", line 268, in download_and_extract_archive
extract_archive(archive, extract_root, remove_finished)
File "/Users/me/lib/python3.7/site-packages/torchvision/datasets/utils.py", line 250, in extract_archive
raise ValueError("Extraction of {} not supported".format(from_path))
ValueError: Extraction of /Users/me/tmp/1rV3aj_hgfNTfCakffpPm7Vhpr1in87CR not supported
related gitissue: https://github.com/pytorch/vision/issues/1028
simaiden
(Simón Sepúlveda Osses)
March 24, 2020, 6:09pm
2
That’s because with google drive you can’t get a direct download link. Try with this :
from torchvision.datasets.utils import download_file_from_google_drive
and extract by yourself or adapt the code and use torchvision.datasets.utils.extract_archive
1 Like
pinocchio
(Rene Sandoval)
March 24, 2020, 6:35pm
3
simaiden:
extract by yourself
how do you do extract the contents? Is it dependent on the zip file format? any examples?
pinocchio
(Rene Sandoval)
March 24, 2020, 6:56pm
4
Do you know how to NOT download the file if the dataset has already been download?
I was reading:
def _check_integrity(self):
zip_filename = self._get_target_folder()
if not check_integrity(join(self.root, zip_filename + '.zip'), self.zips_md5[zip_filename]):
return False
return True
and it doesn’t seem to be doable with mine because I do not have an md5 has for this dataset…
pinocchio
(Rene Sandoval)
March 24, 2020, 7:03pm
5
Temporary solution:
def download_and_extract_miniImagenet(root):
import os
from torchvision.datasets.utils import download_file_from_google_drive, extract_archive
## download miniImagenet
#url = 'https://drive.google.com/file/d/1rV3aj_hgfNTfCakffpPm7Vhpr1in87CR'
file_id = '1rV3aj_hgfNTfCakffpPm7Vhpr1in87CR'
filename = 'miniImagenet.tgz'
download_file_from_google_drive(file_id, root, filename)
fpath = os.path.join(root, filename) # this is what download_file_from_google_drive does
## extract downloaded dataset
from_path = os.path.expanduser(fpath)
extract_archive(from_path)
1 Like
pinocchio
(Rene Sandoval)
March 28, 2020, 9:54pm
6