Dear PyTorch experts,
I am trying to understand the RNN and how to implement it as a classifier (Many to one). I’ve read many tutorials but still confused. One of these tutorials suggest to use the following:
# Recurrent neural network (many-to-one)
class RNN(nn.Module):
def __init__(self, input_size, hidden_size, num_layers, num_classes):
super(RNN, self).__init__()
self.hidden_size = hidden_size
self.num_layers = num_layers
self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
self.fc = nn.Linear(hidden_size, num_classes)
def forward(self, x):
# Set initial hidden and cell states
h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device)
c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device)
# Forward propagate LSTM
out, _ = self.lstm(x, (h0, c0)) # out: tensor of shape (batch_size, seq_length, hidden_size)
# Decode the hidden state of the last time step
out = self.fc(out[:, -1, :])
return out
What I am confused about is why to use the last time step as an input for the dense layer. Also for a binary classification problem for example a sigmoid
activation is applied on the output, why is it not applied here?! ( also for all tutorials that I read).
Thank you very much for your help
** tutorial from here