Only one category classified on the whole dataset in the confusion matrix

I’ve been trying to train model to classify surname language with a char-level rnn.
But model during 10 epochs shows almost the same train and validation loss.
And in confusion matrix all predicts point at only one column.
That’s my notebook on colab.

What am I doing wrong?
Can someone give an advice?

Hi Alger_none!

I’ve only had a really quick look at your code - I can have a more in depth look if this suggestion doesn’t work.

Specifically, I’m looking at your train function. It looks as though you’re trying to zero_grad() your model - not your optimizer. My suggestion would be to refactor your train function have:

  • zero_grad() your optimizer, to avoid accumulating gradients. Usually do this at the start of your training loop.
  • step() your optimizer after `loss.backward()

This tutorial should help you out!

Actually my notebook is modified version of that nice notebook in which there’s no optimizer and SGD is obtained by manual subtracting of gradients.

This way first we zero our net grads:

rnn.zero_grad()

After that we get output of net and calculate our loss:

loss = criterion(output, category_tensor)
Next we calculate derivatives via backward():

loss.backward()

and finally manual subtracting:

for p in rnn.parameters():
        p.data.add_(-learning_rate, p.grad.data)

What I’ve changed in that is that I replace random data point choosing by creating train/validation datasets and add loss output.
But I can’t find the place where I spoil something.
Can you pls take another look at that?

I forgot to reply your message but my previous message is exactly that:grinning:

And after every such training I got different confusion matrices which have the same pattern:
only one category was classified for whole dataset (maybe with some noise).

Examples:
plot

plot%20(1)
and so forth

Also I’ve got the following losses plot:
losses

It seems to me that a trouble related with the hidden state but I’m not sure about that.
Any thoughts pls??

I am not really familiar with RNNs, but looking at your code, I see hidden.detach_(), which removes the variable from the computation graph and therefore may cause your error. Here’s a link that explains what detach() does.

But It has the same effect even without that, that’s the problem(
Any other thoughts?

Unfortunately, not really (not a lot of experience with RNNs). Were you able to reproduce the results from the tutorial that you linked earlier?

Yes, I did.
But there were no data set splitting into train/valid and data points were chosen by random.
So I only added data splitting.
The most confusing moment is that in confusion matrix is “active” only one category column (screens above) after every next training

And from the first epoch validation loss is increasing while training loss is decreasing

The loss difference between training and validation suggests overfitting, which is normal considering that your model seems to always be predicting the same class, as shown in the confusion matrix.

I’m afraid I can’t be of more help, but I would try to find where your model and the one from the tutorial diverge (check if the variable hidden is similar in both cases).

Best of luck!

That’s a pity(
I’ll try to track the divergence…