[Solved] Training a simple RNN

SimonW · December 18, 2017, 11:13pm

I see. Thanks for the additional information. What you do seems correct. Although it is not the most efficient speed-wise, the performance should not degenerate. So I’m curious if you can provide the full script. A reference script in tf/keras/theano/etc would be helpful as well.

For your second question, it doesn’t break autograd because you don’t need the overwritten values to compute any gradients. Same reasoning applies to in-place relu etc.

OcramDiGallura · December 19, 2017, 9:06am

Hi,

OK, thank you for taking the time to understand and answer this.

I will clean up the code and put it here.
In the meanwhile you can take a look at the bottom of this discussion:

The script there is a bit different and you have the reason why I want to process sequences by hand: I want to re-inject predicted labels as input to the network, and embed them just like words.
So together with word embeddings and character features, the input to the hidden layer contains also label embeddings.

I did few days ago, I had no answer, I guess that’s because the discussion is a bit old…
Hope it will be clear enough, otherwise I will come back soon with the cleaned script.

Thank you in advance in any case

OcramDiGallura · December 19, 2017, 3:52pm

Hi again,

I didn’t finish yet (sorry, I have also other things for my job), however cleaning the code I thought this:

char_rep and hidden_state Variables, which are used for keeping character-level representations and hidden states that are used later to compute the network output, are local variables of the forward method of the network.

So, after the forward call normally they go out of scope, isn’t it ?
If this is right, I don’t why I’m not getting any error, it could explain the poor results: back-propagation is not actually updating all the weights up to the character embeddings.
This maybe will not solve my memory problem, but it could be the explanation for results.

See you soon

SimonW · December 20, 2017, 12:36am

You are right in that they go out of scope. However, there are still references to them from the computation graph, which is not explicitly shown. So they are not deallocated/gc’ed.

By the way, how did you init char_input?

SimonW · December 20, 2017, 12:42am

Regarding to your memory issue, is the memory usage increasing every iteration? Or is it just that the script takes a large constant amount of memory?

If it is the latter, it could be related to how you store the char_hidden as a class attribute, self.char_hidden, retaining references to the graph. But it shouldn’t matter if there are backward calls that reach this part of the graph, as backward frees the graph…

Sorry I didn’t look into the details of the code in the post you linked. Knowing the answer to my question above should help us determine where we should look to find the root of cause

Btw, the conversation is getting quite long. Feel free to start a new thread so we don’t disturb others with notifications.

OcramDiGallura · December 20, 2017, 9:06am

OK, thanks.

I indeed answered to your questions in a new thread:

andrelawson · February 20, 2019, 9:46pm

I was facing the same problem myself. Took me a while to find this thread but it is all clear now. Thanks!