Only one category classified on the whole dataset in the confusion matrix

Chame_call · November 10, 2019, 6:15pm

I’ve been trying to train model to classify surname language with a char-level rnn.
But model during 10 epochs shows almost the same train and validation loss.
And in confusion matrix all predicts point at only one column.
That’s my notebook on colab.

What am I doing wrong?
Can someone give an advice?

JamesTrick · November 10, 2019, 8:58pm

Hi Alger_none!

I’ve only had a really quick look at your code - I can have a more in depth look if this suggestion doesn’t work.

Specifically, I’m looking at your train function. It looks as though you’re trying to zero_grad() your model - not your optimizer. My suggestion would be to refactor your train function have:

zero_grad() your optimizer, to avoid accumulating gradients. Usually do this at the start of your training loop.
step() your optimizer after `loss.backward()

This tutorial should help you out!

Chame_call · November 10, 2019, 10:21pm

Actually my notebook is modified version of that nice notebook in which there’s no optimizer and SGD is obtained by manual subtracting of gradients.

This way first we zero our net grads:

rnn.zero_grad()

After that we get output of net and calculate our loss:

loss = criterion(output, category_tensor)
Next we calculate derivatives via backward():

loss.backward()

and finally manual subtracting:

for p in rnn.parameters():
        p.data.add_(-learning_rate, p.grad.data)

What I’ve changed in that is that I replace random data point choosing by creating train/validation datasets and add loss output.
But I can’t find the place where I spoil something.
Can you pls take another look at that?

Chame_call · November 10, 2019, 10:24pm

I forgot to reply your message but my previous message is exactly that:grinning:

Chame_call · November 11, 2019, 10:22am

And after every such training I got different confusion matrices which have the same pattern:
only one category was classified for whole dataset (maybe with some noise).

Examples:
plot

Chame_call · November 11, 2019, 10:22am

plot%20(1)
and so forth

Chame_call · November 11, 2019, 11:04am

Also I’ve got the following losses plot:
losses

Chame_call · November 11, 2019, 1:09pm

It seems to me that a trouble related with the hidden state but I’m not sure about that.
Any thoughts pls??

beaupreda · November 11, 2019, 6:20pm

I am not really familiar with RNNs, but looking at your code, I see hidden.detach_(), which removes the variable from the computation graph and therefore may cause your error. Here’s a link that explains what detach() does.

Chame_call · November 11, 2019, 6:33pm

But It has the same effect even without that, that’s the problem(
Any other thoughts?

beaupreda · November 11, 2019, 6:35pm

Unfortunately, not really (not a lot of experience with RNNs). Were you able to reproduce the results from the tutorial that you linked earlier?

Chame_call · November 11, 2019, 6:46pm

Yes, I did.
But there were no data set splitting into train/valid and data points were chosen by random.
So I only added data splitting.
The most confusing moment is that in confusion matrix is “active” only one category column (screens above) after every next training

Chame_call · November 11, 2019, 6:47pm

And from the first epoch validation loss is increasing while training loss is decreasing

beaupreda · November 11, 2019, 7:04pm

The loss difference between training and validation suggests overfitting, which is normal considering that your model seems to always be predicting the same class, as shown in the confusion matrix.

I’m afraid I can’t be of more help, but I would try to find where your model and the one from the tutorial diverge (check if the variable hidden is similar in both cases).

Best of luck!

Chame_call · November 11, 2019, 7:09pm

That’s a pity(
I’ll try to track the divergence…