How to download torchtext datasets once?


I’m experimenting with torchtext datasets e.g.:

train, test = torchtext.datasets.IMDB(PATH, split=('train', 'test'))

Unfortunately there is no download=True or False option in the set, which means every time I run the script it downloads the entire set from the internet again!

Is there a way to open the set from local folder once it’s there i.e download once?


Use root as argument e.g.

train, test = torchtext.datasets.IMDB(root=/home/user/datasets, split=('train', 'test'))

If it exists, it won’t download again.

Are you sure?
Because I’m already passing PATH as root in the code above, but we I run the line it takes longer than expected and it changes the timestamp on the PATH folder. That makes me think it’s either downloading again or re-building the train,test split … it’s definitely doing something!

Try disconnecting from the internet and see if that line runs fine :slight_smile:

1 Like