Model using too much memory when initialising

Walter_White · September 29, 2023, 10:25am

I am trying to make a headline generator. But when I initialise my model it keeps on crashing. I checked my memory usage and saw that it was using 50GB of memory.

import torch
import torch.nn as nn

class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(RNN, self).__init__()
        self.hidden_size = hidden_size

        self.i2h = nn.Linear(input_size + hidden_size, hidden_size)
        self.i2o = nn.Linear(input_size + hidden_size, output_size)
        self.o2o = nn.Linear(hidden_size + output_size, output_size)
        self.dropout = nn.Dropout(0.1)
        self.softmax = nn.LogSoftmax(dim=1)

    def forward(self, input, hidden):
        input_combined = torch.cat((input, hidden), 1)
        hidden = self.i2h(input_combined)
        output = self.i2o(input_combined)
        output_combined = torch.cat((hidden, output), 1)
        output = self.o2o(output_combined)
        output = self.dropout(output)
        output = self.softmax(output)
        return output, hidden

    def initHidden(self):
        return torch.zeros(1, self.hidden_size)

Here is how I initialise it

rnn = RNN(len(vocab.get_itos()), 128, len(vocab.get_itos()))

The length of the vocab is 102,577. I don’t know if that’s too much or if my computer is just bad but if it is too big what should I do to reduce it

bloos · September 29, 2023, 12:34pm

What exceptions or errors do you get? What are your system specs?

vdw · September 29, 2023, 12:43pm

What happens if you try just for testing:

rnn = RNN(100, 128, 100)

Walter_White · September 29, 2023, 1:08pm

I dont know how to check on a jupyter notebook but I am on an intel 2017 MacBook air with 8GB ram

Walter_White · September 29, 2023, 1:11pm

It runs instantly no errors. As I said i think it might be the large vocab. How can I decrease it

bloos · September 29, 2023, 1:43pm

with a vocab size of 102577 you have (102577+128)*128 parameters per linear layer.

this gives a total amount of parameters of 3*(102,577+128)*128 = 39438720. Assuming you are training with 32 bit precision this should take a total of 150 mb. This is much, but acceptable, even for 8GB of RAM. There are networks out there, that are MUCH bigger.
What is your batch size?

bloos · September 29, 2023, 1:48pm

you could train only on lower case letters. This should bring down your vocabulary by a bit. But I would recommend you to use a proper Tokenizer

vdw · September 29, 2023, 1:55pm

There are many common preprocessing techniques, e.g.;

case-folding
stemming/lemmatization
normalization (e.g., removal of numbers, emoji, emoticons, punctuation marks)
removal of rare words (optional: replacing with a special “unknown” token)
subword tokenization

Which steps are appropriate depends on your tasks. Since you are trying to generate text, stemming/lemmatization is properly out. Maybe this notebook gives some ideas.

Walter_White · September 29, 2023, 10:30pm

For me the kernel only crashes with the code above

Walter_White · September 30, 2023, 5:02am

I uploaded the code to kaggle

vdw · September 30, 2023, 6:50am

You can try to convert your notebook to a .py script and run it. Jupyter might have settings to restrict the memory a notebook/kernel is allowed to use:

jupyter nbconvert --to script mynotebook.ipynb

And then run

python mynotebook.py

I as also wondering what you’re trying to do. From your notebook it looks like you want to predict the headline give a short description of a news article. Is this correct? Because if so, I wondering about this snippet:

    for i in range(input_line_tensor.size(0)):
        output, hidden = rnn(input_line_tensor[i], hidden)
        l = criterion(output, target_line_tensor[i])
        loss += l

As this assume that input_line_tnesor and target_line_tensor have the same length.

Your network is suitable for a sequence-labeling task. However, if you want to generate headline from a news article as input, then this is a sequence-to-sequence architecture, and you need a encoder-decoder architecture.

Walter_White · September 30, 2023, 8:22am

I’m pretty new to machine learning and replacing this tutorial with my own code. The aim is to generate a headline given the description of the article. The code you showed was copied from the tutorial and I had not edited it yet.

I run the notebook as a script and after reaching 55GB of ram it gave me an error of

Killed: 9

I feel like it might be a memory leak because as @bloos said it shouldn’t be using that much ram

vdw · September 30, 2023, 10:40am

This is a very different task that is done here. Look how the input and target of a single training sample looks like:

character-rnn

The input is a name, and the target is the same name shifted by one letter to the left. In short, this network’s goal is to train a language model which will then generate a name given a start sequence of letters.

What you want is a Seq2Seq model similar to Machine Translation, where the input is a text (e.g., a short summary of a news article) and the output is a new text (e.g., a headline).

Walter_White · September 30, 2023, 11:09pm

Oh, I thought I could change it from letters to words and generate words. Anyways I’ll try the Seq2Seq model