Met the same question today.
I try to directly access the file link. But it fails either, saying " The requested URL could not be retrieved".
I guess the database server is down or their network is broken.
Hope it can recover soon… Or are there any alternatives?
I am having exactly the same problem. I am new to NLP so I am not sure what to do. Perhaps somebody knows another source of English/German (or another language) word pairs that we can use instead. I will check back later.
Status:
Host ``www.quest.dcs.shef.ac.uk`` forgot to update their SSL
certificate; therefore, this dataset does not download securely.
References:
* http://www.statmt.org/wmt16/multimodal-task.html
* http://shannon.cs.illinois.edu/DenotationGraph/
He seems to have constructed a workaround program but I have not managed to get it to work.
I didn’t know these were being used in a PyTorch tutorial so we are working on hosting these files elsewhere. Alternatively, if someone understands how the files are being used by torchtext.datasets.Multi30K, would one solution be to re-route the data loading to the Multi30K Github repository?
This code uses the same Multi30K database. I was able to get the code to work by using another data file. The basic idea is that the training, validation, and test sets are all lists of tuples. The tuples consist of sentence pairs in each language. This insight is nice since it makes it easy to create any language pairing you would like. Here is my implementation in Colab along with lots of notes:
if you do not want to update the URL of multi30k, you can just download the file from url above and put the tar.gz file to the torch cache directory. for my machine, the directory is :/root/.cache/torch/text/datasets/Multi30k , copy the tar.gz file into directory ,and run the code. pytorch will uncompress the file , get train.de train.en files