RNN for word completion doesn't learn

Hi, I try to implement a RNN Network which give predictions of the next char, based on an input text of 30 chars. For the input I use the one-hot encoding. For some reason the network doesn’t learn The results of my loss functions are:

Relative loss over epoch 0: 3.207843780517578
Relative loss over epoch 1: 2.8688047726949057
Relative loss over epoch 2: 2.8401947021484375
Relative loss over epoch 3: 2.813974698384603
Relative loss over epoch 4: 2.811510960261027
Relative loss over epoch 5: 2.8013088703155518
Relative loss over epoch 6: 2.8062755266825357
Relative loss over epoch 7: 2.8083527088165283
Relative loss over epoch 8: 2.791940609614054
Relative loss over epoch 9: 2.8048322995503745

As you can see the first step of the network is pretty good but after that the network don’t change much. Below is my model and the train function for my network. Can someone tell me why the network can’t learn?

Further information:
The target tensor if of size batch_size (81 in my example) and includes the number which represents a char in the one-hot encoding.
The data we give in the model is of size (length of input text; batch_size; number of different chars)
The output we get is of size (batch_size; number of different chars)

class LSTM_RNN(nn.Module):
    
    def __init__(self, no_classes):
        super(LSTM_RNN, self).__init__()
        
        self.lstm = nn.LSTM(input_size = no_classes, hidden_size = args.hidden_size, num_layers = 3)
        self.linear = nn.Linear(in_features = args.hidden_size, out_features = no_classes)

        self.hidden = self.init_hidden()      
    
    def init_hidden(self, batch_size=args.batch_size):
        h0 = Variable(torch.zeros(args.num_layers, batch_size, args.hidden_size))
        c0 = Variable(torch.zeros(args.num_layers, batch_size, args.hidden_size))
        self.hidden = (h0, c0)
    
    def forward(self, x):
        lstm_out, self.hidden = self.lstm(x, self.hidden)
        linear_out = self.linear(lstm_out[-1])

        return linear_out

# Training loop (one epoch)
def train(model, epoch):
    hiddenLayers = model.init_hidden()
    model.train()
    criterion = nn.CrossEntropyLoss() 
    total_loss = 0.0

    for batch_idx, (data, target) in enumerate(train_loader):
        data = data.view(data.size(1), data.size(0), data.size(2))
        if args.cuda:
            data, target = data.cuda(), target.cuda()
        data, target = Variable(data), Variable(target)

        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target.type(torch.LongTensor)) # check how far away the output is from the original data
        loss.backward(retain_graph=True)
        optimizer.step()

        total_loss += loss.data[0]

    relative_loss = total_loss/float(len(train_loader))
    print('Relative loss over epoch %s: %s' %(epoch, relative_loss))
    return relative_loss # return the relative loss for later analysis