with a vocab size of 102577 you have (102577+128)*128 parameters per linear layer.
this gives a total amount of parameters of 3*(102,577+128)*128 = 39438720. Assuming you are training with 32 bit precision this should take a total of 150 mb. This is much, but acceptable, even for 8GB of RAM. There are networks out there, that are MUCH bigger.
What is your batch size?
You can try to convert your notebook to a .py script and run it. Jupyter might have settings to restrict the memory a notebook/kernel is allowed to use:
jupyter nbconvert --to script mynotebook.ipynb
And then run
I as also wondering what you’re trying to do. From your notebook it looks like you want to predict the headline give a short description of a news article. Is this correct? Because if so, I wondering about this snippet:
for i in range(input_line_tensor.size(0)):
output, hidden = rnn(input_line_tensor[i], hidden)
l = criterion(output, target_line_tensor[i])
loss += l
As this assume that input_line_tnesor and target_line_tensor have the same length.
Your network is suitable for a sequence-labeling task. However, if you want to generate headline from a news article as input, then this is a sequence-to-sequence architecture, and you need a encoder-decoder architecture.
I’m pretty new to machine learning and replacing this tutorial with my own code. The aim is to generate a headline given the description of the article. The code you showed was copied from the tutorial and I had not edited it yet.
I run the notebook as a script and after reaching 55GB of ram it gave me an error of
I feel like it might be a memory leak because as @bloos said it shouldn’t be using that much ram
This is a very different task that is done here. Look how the input and target of a single training sample looks like:
The input is a name, and the target is the same name shifted by one letter to the left. In short, this network’s goal is to train a language model which will then generate a name given a start sequence of letters.
What you want is a Seq2Seq model similar to Machine Translation, where the input is a text (e.g., a short summary of a news article) and the output is a new text (e.g., a headline).