Creating a part of speech LSTM. Not sure how to shape the data if I am batching sentences of similar length.
X.shape = (100, 60, 4), [batch, length of sentence, features per word]
output.shape = (100, 60, 10) [batch, length of sentences, type of word (10 potential types)]
y.shape = (100, 60)
The error I get: Expected target size (100, 10), got torch.Size([100, 60])
Below the code of my network:
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.lstm1 = nn.GRU(input_size = 4, hidden_size = 32, bidirectional= True)
self.fcn1 = nn.Linear(64, 512)
self.fcn2 = nn.Linear(512, 512)
self.fcn3 = nn.Linear(512, 10)
self.softmax = nn.LogSoftmax(dim=2)
def forward(self, x):
x, _ = self.lstm1(x)
x = self.fcn1(x)
x = self.fcn2(x)
x = self.fcn3(x)
x =x.squeeze(1)
return x
net = Net()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=0.0001)
for epoch in range(5000):
optimizer.zero_grad()
running_loss = 0.0
outputs = net(X)
print(outputs.shape, y.shape)
loss = criterion(outputs, y)
loss.backward()
optimizer.step()
running_loss += loss.item()
if epoch % 100 == 0:
print("Epoch: ", epoch, "Loss: " , running_loss)
nn.GRU expects the input in the shape [seq_len, batch_size, input_features] by default.
You could use batch_first=True to pass the inputs as [batch_size, seq_len, input_features], which seems to match your current input shape.
The output will have the shape [batch_size, seq_len, hidden*num_directions].
The following linear layers will apply their operations on dim1 “in a loop”, i.e. the linear transformation will be applied on each sample in the seq_len dimension.
The last x.squeeze(1) won’t have any effect, as the temporal dimension will stay at 60 in dim1.
Also, you don’t have any non-linearities between the linear layers, so you might want to add them.
Thanks for the reply. I also had to change the loss function. Cross Entropy does not seem to work for many-to-many LSTM. Instead I am using
criterion = nn.MSELoss()
shape of output = [100, 60, 10 ] and shape of y = [100, 60, 10] where the 10 is one hot encoded. I can’t get cross entropy criterion to work on this form of data.
What is your number of classes and what do the dimensions of the output represent?
Are you working on a multi-class classification, where each temp. step would correspond to a single class?
If so, nn.CrossEntropyLoss expects the model output to have the shape [batch_size, nb_classes, seq_len] and the target as [batch_size, seq_len] containing the class indices in the range [0, nb_classes-1].