Sequence multilabel classification accuracy problem

omph · May 6, 2020, 6:14pm

I’m trying to implement a multi-label classification task, and currently my model has Embedding, GRU, 2x Linear layers.

I have padded the data, and its shape is (seq_len x batch) where seq_len is the longest sequence in that batch. Targets are multi-hot encoded as I’m using BCEWithLogitsLoss.

I have a weird issue that when using batch size > 1, I get much lower accuracy (0.3) when using batch size = 1 (0.8). I suspected it might be padding thing, but I was also able to reproduce this with same length sequences. I’m trying my luck here if anyone has encountered something similar and what the problem was?

ptrblck · May 7, 2020, 6:23am

How large is the batch size for the low accuracy?
Huge batch sizes tend to give a worse accuracy, but the gap seems to be too large for it.

Could you post the code of your model and training routine, so that we could have a look, if maybe a slicing operation is wrong?

ptrblck · May 7, 2020, 8:21am

I assume your input has the size [batch_size, seq_len]?
If so, then self.gru would get an input of [batch_size, seq_len, embedded_dim], while it expects an input of [seq_len, batch_size, input_size] in the default setup.

If my assumptions are correct, you could either permute the input or use batch_first=True while creating the nn.GRU, which would then expect an input of [batch_size, seq_len, features].

omph · May 7, 2020, 8:26am

Actually it is seq_len x batch_size. I printed shapes for all layers:

inputs shape:  torch.Size([377, 2])
embeds shape:  torch.Size([377, 2, 100])
hidden[0] shape:  torch.Size([2, 64])
output shape:  torch.Size([2, 126])

ptrblck · May 7, 2020, 8:35am

The shapes look correct.
I’m currently unsure, what might be wrong
Did you verify that your collate_fn is creating valid batches?

omph · May 7, 2020, 8:39am

Afaik yes. But anyway this helps as at least it’s not some trivial bug in the model, but might be related to something else. Thanks for your help!