I’m looking for a way to implement one to many RNN/LSTM at PyTorch, but I can’t understand how to evaluate loss function and feed forward outputs of one hidden layer to another like at the picture
Here’s the raw LSTM code, could somebody help to adapt it?
class LSTMModel(nn.Module):
def __init__(self, input_dim, hidden_dim, layer_dim, output_dim):
super(LSTMModel, self).__init__()
# Hidden dimensions
self.hidden_dim = hidden_dim
# Number of hidden layers
self.layer_dim = layer_dim
# Building your LSTM
# batch_first=True causes input/output tensors to be of shape
# (batch_dim, seq_dim, feature_dim)
self.lstm = nn.LSTM(input_dim, hidden_dim, layer_dim, batch_first=True)
# Readout layer
self.fc = nn.Linear(hidden_dim, output_dim)
#self.fc2 = nn.Linear(hidden_dim, output_dim)
def forward(self, x):
# Initialize hidden state with zeros
h0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_()
c0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_()
out, (h_t, cn) = self.lstm(x, (h0, c0))
print(h_t.shape)
h_t=h_t.reshape([batch_size,layer_dim, hidden_dim])
out = self.fc(h_t[:, -1, :])
return out
Things to consider: the LSTM takes a hidden state and cell state. Its first input can be initialised as zeroes, as you do, but after that you have to provide the previous last hidden/cell state as the new input.
GRU is a good advice, thank you, I have all the y^i for the learning process, how should I calculate the loss?
And about LSTM, are there any options to do it via PyTorch, not implementing LSTM from scratch?