I am trying to use openNMT on the WMT german-english dataset. This dataset is about 600 Mb.
After I preprocess the data, I try to save them with the syntax
This results in a MemoryError that I am not sure how to fix. I am aware that there are some issues with torch.save when the models are store on GPU (at least according to some of the questions on this forum) but I am still in the pre-processing phase, so everything is still on CPU.
Well this should be avoided.
In their example, they use it at the complete end of the script so that is fine because when python quits, the files are closed, but in general it is not a good practice.