Creating a part of speech LSTM. Not sure how to shape the data if I am batching sentences of similar length.
X.shape = (100, 60, 4), [batch, length of sentence, features per word]
output.shape = (100, 60, 10) [batch, length of sentences, type of word (10 potential types)]
y.shape = (100, 60)
The error I get: Expected target size (100, 10), got torch.Size([100, 60])
Below the code of my network:
import torch.nn as nn
import torch.nn.functional as F
self.lstm1 = nn.GRU(input_size = 4, hidden_size = 32, bidirectional= True)
self.fcn1 = nn.Linear(64, 512)
self.fcn2 = nn.Linear(512, 512)
self.fcn3 = nn.Linear(512, 10)
self.softmax = nn.LogSoftmax(dim=2)
def forward(self, x):
x, _ = self.lstm1(x)
x = self.fcn1(x)
x = self.fcn2(x)
x = self.fcn3(x)
net = Net()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=0.0001)
for epoch in range(5000):
running_loss = 0.0
outputs = net(X)
loss = criterion(outputs, y)
running_loss += loss.item()
if epoch % 100 == 0:
print("Epoch: ", epoch, "Loss: " , running_loss)
Thanks for the help !
nn.GRU expects the input in the shape
[seq_len, batch_size, input_features] by default.
You could use
batch_first=True to pass the inputs as
[batch_size, seq_len, input_features], which seems to match your current input shape.
The output will have the shape
[batch_size, seq_len, hidden*num_directions].
The following linear layers will apply their operations on
dim1 “in a loop”, i.e. the linear transformation will be applied on each sample in the
x.squeeze(1) won’t have any effect, as the temporal dimension will stay at 60 in
Also, you don’t have any non-linearities between the linear layers, so you might want to add them.
Thanks for the reply. I also had to change the loss function. Cross Entropy does not seem to work for many-to-many LSTM. Instead I am using
criterion = nn.MSELoss()
shape of output = [100, 60, 10 ] and shape of y = [100, 60, 10] where the 10 is one hot encoded. I can’t get cross entropy criterion to work on this form of data.
What is your number of classes and what do the dimensions of the output represent?
Are you working on a multi-class classification, where each temp. step would correspond to a single class?
nn.CrossEntropyLoss expects the model output to have the shape
[batch_size, nb_classes, seq_len] and the target as
[batch_size, seq_len] containing the class indices in the range
I am working on a multi-class classification where each t_i needs to be classified.
Not sure how i would go about transforming output:
current output is
[number of batches, sequence length, number of classes]
target is correct
[number of batches, sequence length]
output = output.permute(0, 2, 1)