I am trying to make a headline generator. But when I initialise my model it keeps on crashing. I checked my memory usage and saw that it was using 50GB of memory.
with a vocab size of 102577 you have (102577+128)*128 parameters per linear layer.
this gives a total amount of parameters of 3*(102,577+128)*128 = 39438720. Assuming you are training with 32 bit precision this should take a total of 150 mb. This is much, but acceptable, even for 8GB of RAM. There are networks out there, that are MUCH bigger.
What is your batch size?
There are many common preprocessing techniques, e.g.;
case-folding
stemming/lemmatization
normalization (e.g., removal of numbers, emoji, emoticons, punctuation marks)
removal of rare words (optional: replacing with a special “unknown” token)
subword tokenization
Which steps are appropriate depends on your tasks. Since you are trying to generate text, stemming/lemmatization is properly out. Maybe this notebook gives some ideas.
You can try to convert your notebook to a .py script and run it. Jupyter might have settings to restrict the memory a notebook/kernel is allowed to use:
jupyter nbconvert --to script mynotebook.ipynb
And then run
python mynotebook.py
I as also wondering what you’re trying to do. From your notebook it looks like you want to predict the headline give a short description of a news article. Is this correct? Because if so, I wondering about this snippet:
for i in range(input_line_tensor.size(0)):
output, hidden = rnn(input_line_tensor[i], hidden)
l = criterion(output, target_line_tensor[i])
loss += l
As this assume that input_line_tnesor and target_line_tensor have the same length.
Your network is suitable for a sequence-labeling task. However, if you want to generate headline from a news article as input, then this is a sequence-to-sequence architecture, and you need a encoder-decoder architecture.
I’m pretty new to machine learning and replacing this tutorial with my own code. The aim is to generate a headline given the description of the article. The code you showed was copied from the tutorial and I had not edited it yet.
I run the notebook as a script and after reaching 55GB of ram it gave me an error of
Killed: 9
I feel like it might be a memory leak because as @bloos said it shouldn’t be using that much ram
This is a very different task that is done here. Look how the input and target of a single training sample looks like:
The input is a name, and the target is the same name shifted by one letter to the left. In short, this network’s goal is to train a language model which will then generate a name given a start sequence of letters.
What you want is a Seq2Seq model similar to Machine Translation, where the input is a text (e.g., a short summary of a news article) and the output is a new text (e.g., a headline).