Hi everyone! i have a biLSTM model which I’m using to classify posts. It is a binary classification task. I am using batch first so the input to the lstm is of the shape [8x50x768], I then take the ‘output’ of the lstm layer which is of shape [8x50x40]. I then pass it through a linear layer and then a sigmoid function to map the output to a value between 0 and 1. However, the output after all this is a 3d tensor of shape [8x50x1] and I’m unsure how to use that to get a singular value for each item in the batch to compare to the labels which are a list of values of size of the batch.
I read online that you can use max pooling? but I’m not entirely sure if that is correct.I tried using that with torch.max(output, 1). And that leaves me with a tensor that’s [8x40] and after I pass that through the other layers I end with [8x1] which I then squeeze so I can compare with the labels. Not sure if any of this is “correct” though, so would appreciate if someone could explain that to me.
A side question I had was whether you should use the output part of the lstm output for binary classification or if you should use the h_n or c_n parts?