Could not get the file at http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/training.tar.gz. [RequestException] None

Hello,

Could not get the file at http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/training.tar.gz. [RequestException] None.

The error above showed up after calling the build_vocab_from_iterator() here:

vocab_src = build_vocab_from_iterator(
        yield_tokens(train + val + test, tokenize_de, index=0),
        min_freq=2,
        specials=["<s>", "</s>", "<blank>", "<unk>"],
    )

I am using Google colab to run notebook on this repo. Link to notebook: Google Colab

Thank you in advance.

1 Like

The problem is from the dataset multi30k (source url: “http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/training.tar.gz”) that is not accessible right now.


Note: the torchtext.vocab.build_vocab_from_iterator() on the Google Colab notebook above is calling this dataset. (Sorry for not being specific in describing the problem)

1 Like

I’m experiencing the same problem for some time, and can’t seem to find a solution to it.

I think you could use other datasets instead.