I’m building a multiclass classification model using a GRU. I’m struggling to get my head around how to shape the output such that the loss can be calculated.
Here is my code:
def __init__(self, input_dim, embedding_dim, hidden_dim, output_dim, weights=None):
self.embedding = nn.Embedding(input_dim, embedding_dim)
if weights is not None:
self.embedding.weight.requires_grad = False
self.lstm = nn.LSTM(embedding_dim, hidden_dim)
self.linear = nn.Linear(hidden_dim, output_dim)
self.sigmoid = nn.Sigmoid()
def forward(self, inp):
embed_out = self.embedding(inp)
out, (hidden1, hidden2) = self.lstm(embed_out)
out = self.sigmoid(self.linear(hidden1.squeeze(0)))
The output shape in my batch [342, 51] whereas the shape of the label is [32, 51] which means the loss can’t be calculated. Any pointers would be very helpful - thank you.
Can you give more details? What parameters are you using for embedding_dim, hidden_dim and output_dim? Also what is the format of your label?
The LSTM layer takes the tensor of shape (seq_len, batch, features), so to comply with this, you have to call to the lstm with “self.lstm(embed_out.transpose(0,1))”, unless you inp is in the shape of (seq_len, batch) or you have defined the lstm class with “batch_first=True”. I don’t know why you are working on the hidden1, but usually you take the “out = self.sigmoid(self.linear(out)).reshape(-1, class_num)”, so your out is the shape of (seq_len*batch, class_num).
input_dim = 3908
embedding_dim = 300
hidden_dim = 100
output_dim = 51
Label format is a matrix of ints which represent the multiple classes.
Thanks this is helpful. I still can’t align the final shape however. The target batch size is 32 with 51 classes. I can’t figure out how to get the output of the model to then match with this matrix. I’ve made the your suggestions I’m getting a [9952, 51] output - are there some additional steps I’m missing in terms of getting the model output to match the target shape?
Thanks for your help!
Well you target shape isn’t going to be (32,51), unless the length of the sequences is 1. Perhaps you just want “out = self.sigmoid(self.linear(out[-1,:,:])).reshape(-1, class_num)”, which will provide just the last element of each sequence, thus being of a shape (batch, class_num).
Thank you - this solves the issue!