I am trying to use openNMT on the WMT german-english dataset. This dataset is about 600 Mb.
After I preprocess the data, I try to save them with the syntax
torch.save(train, open('trainFile', 'wb'), pickle_module=dill)¨¨
This results in a MemoryError that I am not sure how to fix. I am aware that there are some issues with torch.save when the models are store on GPU (at least according to some of the questions on this forum) but I am still in the pre-processing phase, so everything is still on CPU.
How can I solve the problem?
I’m having the same problem as yours. Still waiting for solutions.
You should not pass a file descriptor like that because it won’t be closed, use:
torch.save(train, 'trainFile', pickle_module=dill)
The official codes use
torch.save(train, open(opt.save_data + '.train.pt', 'wb'))
Well this should be avoided.
In their example, they use it at the complete end of the script so that is fine because when python quits, the files are closed, but in general it is not a good practice.
I modified that line into
torch.save(train, opt.save_data + '.train.pt', pickle_module=dill)
but it says:
NameError: name 'dill' is not defined
If you’re not using specifically dill, just leave the default value:
torch.save(train, opt.save_data + '.train.pt').
Thanks, but the memory error still occurs. It seems that I have to divide the data set into several parts.