Problem with simple RNN and char-base LM

I am going to build a simple char-base auto-encoder using names.txt dataset and simple RNN units. As I use the same shape for input and output (Size = [8, 27]), the model does not perform good and I got high loss.
Here is the sample of my dataset (x,y):

tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.,
          0., 0., 0., 0., 0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.,
          0., 0., 0., 0., 0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.,
          0., 0., 0., 0., 0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
          0., 0., 0., 0., 1., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.,
          0., 0., 0., 0., 0., 0., 0., 0., 0.],
         [0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
          0., 0., 0., 0., 0., 0., 0., 0., 0.],
         [1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
          0., 0., 0., 0., 0., 0., 0., 0., 0.],
         [1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
          0., 0., 0., 0., 0., 0., 0., 0., 0.]]),

 tensor([15, 12,  9, 22,  9,  1,  0,  0])

Here is my simple model:

class cls_model(nn.Module):
    def __init__(self, num_layer = 2) -> None:
        super().__init__()

        self.h0 = torch.zeros((num_layer, 27))
        self.model = nn.Sequential(nn.RNN(input_size=27, hidden_size=27, num_layers= num_layer))
        self.classifier = nn.Sequential( nn.Linear(27,27), nn.ReLU(), nn.Linear(27,27))
        

    def forward(self, x):

        x,_ = self.model.forward(x)
        x = self.classifier(x)
        return x

Here is training loop:

model = cls_model(2)

criterion = nn.CrossEntropyLoss()
optim = torch.optim.SGD(model.parameters(), lr=3e-3)

epochs = 5
model.train()

for i in range(epochs):
    for idx, (x, y) in enumerate(ds_loader):
        
        y_pred = model.forward(x)

        y_pred = F.softmax(y_pred, dim=2)
        
        optim.zero_grad()
        loss = torch.tensor(0.0)
        for j in range(y_pred.shape[0]):
            
            loss += criterion.forward(y_pred[j], y[j])

        loss.backward()
        optim.step()

Is the logic true about the model? Why it does not predict well?

What are you trying to predict? Next letter?

I noticed your forward pass doesn’t make use of the hidden layer. What is the point of using an RNN if you aren’t passing the hidden layer through to the next time step?

1 Like

I just want to try to implement a simple autoencoder with RNN. For each word, I want to build it again. I have fixed that that hidden state issue.
Next, I am going to implement the next-char prediction.