LSTM multiclass output shape

torch · May 16, 2019, 4:57pm

I’m building a multiclass classification model using a GRU. I’m struggling to get my head around how to shape the output such that the loss can be calculated.

Here is my code:

class Network(nn.Module):
def __init__(self, input_dim, embedding_dim, hidden_dim, output_dim, weights=None):
    super(Net, self).__init__()
    self.embedding = nn.Embedding(input_dim, embedding_dim)
    if weights is not None:
        self.embedding.weight.data.copy_(weights)

    self.embedding.weight.requires_grad = False
    self.lstm = nn.LSTM(embedding_dim, hidden_dim)
    self.linear = nn.Linear(hidden_dim, output_dim)
    self.sigmoid = nn.Sigmoid()

def forward(self, inp):
    embed_out = self.embedding(inp)
    out, (hidden1, hidden2) = self.lstm(embed_out)
    out = self.sigmoid(self.linear(hidden1.squeeze(0)))
    return out.squeeze(-1)

The output shape in my batch [342, 51] whereas the shape of the label is [32, 51] which means the loss can’t be calculated. Any pointers would be very helpful - thank you.

DanielLuci · May 16, 2019, 7:28pm

Can you give more details? What parameters are you using for embedding_dim, hidden_dim and output_dim? Also what is the format of your label?

gslaller · May 16, 2019, 7:40pm

The LSTM layer takes the tensor of shape (seq_len, batch, features), so to comply with this, you have to call to the lstm with “self.lstm(embed_out.transpose(0,1))”, unless you inp is in the shape of (seq_len, batch) or you have defined the lstm class with “batch_first=True”. I don’t know why you are working on the hidden1, but usually you take the “out = self.sigmoid(self.linear(out)).reshape(-1, class_num)”, so your out is the shape of (seq_len*batch, class_num).

torch · May 17, 2019, 8:26am

Sure

input_dim = 3908
embedding_dim = 300
hidden_dim = 100
output_dim = 51

Label format is a matrix of ints which represent the multiple classes.

torch · May 17, 2019, 9:56am

Thanks this is helpful. I still can’t align the final shape however. The target batch size is 32 with 51 classes. I can’t figure out how to get the output of the model to then match with this matrix. I’ve made the your suggestions I’m getting a [9952, 51] output - are there some additional steps I’m missing in terms of getting the model output to match the target shape?

Thanks for your help!

gslaller · May 17, 2019, 11:14am

Well you target shape isn’t going to be (32,51), unless the length of the sequences is 1. Perhaps you just want “out = self.sigmoid(self.linear(out[-1,:,:])).reshape(-1, class_num)”, which will provide just the last element of each sequence, thus being of a shape (batch, class_num).

torch · May 17, 2019, 11:23am

Thank you - this solves the issue!