Using mini-batch causes low accuracy for a sequence classification task

I have a multi-class classification dataset like the below. The number of class is five.

Input: i like summer. the weather is nice.
Output: 3 // WEATHER class index

Then, I created a model. This is working without bugs. But I am not sure that the accuracy is only high when I set batch-size 1. If I set a little larger number (such as 3, 5, 10), the accuracy drops more than 30%.

When I use mini-batch, I padded 0 to fix the length of sequences. Then, make it reversed. For example, the following minibatch is 53x3 (Length x batch_size).

I assume that padding makes something wrong in my model. Or, is it common phenomenon in sequence training?

    0    17     0
    0   484     0
    0   481     0
    0   605     0
    0   539     0
    0   675     0
    0   640     0
  539   334     0
  126    44     0
  699   216     0
  256   570     0
  334   688   539
  578   251   126
  525     3   563
  295     8   256
   27   525   334
  578   131   701
   87    63   578
  235   457    71
  334   205   525
  119   386   457
  444    95    35
class Simple(nn.Module):
    def __init__(self, vocab_size, embd_size, hidden_size, class_size):
        super(Simple, self).__init__()
        self.embd = nn.Embedding(vocab_size, embd_size, padding_idx=0)
        self.ctx_encoder = nn.GRU(embd_size, hidden_size, bidirectional=True)
        self.decoder     = nn.Linear(hidden_size*2, hidden_size)
        self.last_layer  = nn.Linear(hidden_size, class_size)

    def forward(self, x):
        '''
        x: (L, bs)  batch first is False
        '''
        batch_size = x.size(1)
        x = self.embd(x) # (bs, L, E)
        _, h = self.ctx_encoder(x) # (L, bs,  2H), (2, bs, H)
        h = h.view(batch_size, -1) # (bs, 2H)
        out = self.decoder(h) # (bs, H)
        out = self.last_layer(out) # (bs, class_size)
        return F.log_softmax(out, -1)

Did you use the provided pack_padded_sequence (https://pytorch.org/docs/master/nn.html#torch.nn.utils.rnn.pack_padded_sequence) to make sure that it doesn’t forward the pads?

OMG. I noticed that view's behavior was not that I thought.
I needed to use concat for this.

# h = h.view(batch_size, -1) # (bs, 2H)
h = torch.cat([hh for hh in h], -1) # (bs, 2H)

This is a simple example.

In [93]: h=torch.randn(2,3,4)

In [94]: h
Out[94]:

(0 ,.,.) =
 -0.3609 -1.3455 -0.5885  0.2888
  0.5073  1.5901  0.5655 -0.0937
  0.4892  0.7472 -0.4066 -0.8092

(1 ,.,.) =
 -1.0095  0.1905 -0.1127  2.2638
 -0.7856 -0.0417 -0.0152  1.2649
  0.6107  1.5546  0.5480  0.6572
[torch.FloatTensor of size 2x3x4]

In [96]: torch.cat([hh for hh in h], -1)  // Correct
Out[96]:

-0.3609 -1.3455 -0.5885  0.2888 -1.0095  0.1905 -0.1127  2.2638
 0.5073  1.5901  0.5655 -0.0937 -0.7856 -0.0417 -0.0152  1.2649
 0.4892  0.7472 -0.4066 -0.8092  0.6107  1.5546  0.5480  0.6572
[torch.FloatTensor of size 3x8]

In [97]: h.view(3, -1) // Wrong
Out[97]:

-0.3609 -1.3455 -0.5885  0.2888  0.5073  1.5901  0.5655 -0.0937
 0.4892  0.7472 -0.4066 -0.8092 -1.0095  0.1905 -0.1127  2.2638
-0.7856 -0.0417 -0.0152  1.2649  0.6107  1.5546  0.5480  0.6572
[torch.FloatTensor of size 3x8]
1 Like

I found that this is a simple mistake. My view function is not correct to concat bidirectional data.

I was encountering same problem as well and your post helped me! Thanks!