[Solved] Training a simple RNN

I see. Thanks for the additional information. What you do seems correct. Although it is not the most efficient speed-wise, the performance should not degenerate. So I’m curious if you can provide the full script. A reference script in tf/keras/theano/etc would be helpful as well.

For your second question, it doesn’t break autograd because you don’t need the overwritten values to compute any gradients. Same reasoning applies to in-place relu etc.

Hi,

OK, thank you for taking the time to understand and answer this.

I will clean up the code and put it here.
In the meanwhile you can take a look at the bottom of this discussion:

The script there is a bit different and you have the reason why I want to process sequences by hand: I want to re-inject predicted labels as input to the network, and embed them just like words.
So together with word embeddings and character features, the input to the hidden layer contains also label embeddings.

I did few days ago, I had no answer, I guess that’s because the discussion is a bit old…
Hope it will be clear enough, otherwise I will come back soon with the cleaned script.

Thank you in advance in any case

Hi again,

I didn’t finish yet (sorry, I have also other things for my job), however cleaning the code I thought this:

char_rep and hidden_state Variables, which are used for keeping character-level representations and hidden states that are used later to compute the network output, are local variables of the forward method of the network.

So, after the forward call normally they go out of scope, isn’t it ?
If this is right, I don’t why I’m not getting any error, it could explain the poor results: back-propagation is not actually updating all the weights up to the character embeddings.
This maybe will not solve my memory problem, but it could be the explanation for results.

See you soon

You are right in that they go out of scope. However, there are still references to them from the computation graph, which is not explicitly shown. So they are not deallocated/gc’ed.

By the way, how did you init char_input?

Regarding to your memory issue, is the memory usage increasing every iteration? Or is it just that the script takes a large constant amount of memory?

If it is the latter, it could be related to how you store the char_hidden as a class attribute, self.char_hidden, retaining references to the graph. But it shouldn’t matter if there are backward calls that reach this part of the graph, as backward frees the graph…

Sorry I didn’t look into the details of the code in the post you linked. Knowing the answer to my question above should help us determine where we should look to find the root of cause :slight_smile:

Btw, the conversation is getting quite long. Feel free to start a new thread so we don’t disturb others with notifications.

OK, thanks.

I indeed answered to your questions in a new thread:

I was facing the same problem myself. Took me a while to find this thread but it is all clear now. Thanks!