Using a bidirectional nn.GRU Gated Recurrent Unit understand forwarding process

Hi,
I have to use a nn.GRU for Character Prediction in short sentences. These Characters might be at any Place inside a String. The GRU has to be bidirectional and I do have to solve a few Tasks.
1: Split the output into a forward hald and a backward halt inside the forwarding of the model.
2: Train using the parameters: Epoch and unit (hiddenunits in Model) to reach a Cross Entropy Loss of below 0.2

I did try to find a Solution but to be honest i neither understand the Structure of the Output nor i do unterstand how to slice and concenate for a bidirectional approach For the first Task everything should be done using 4 lines and inside this code:

def forward(self, x):
        # x shape: (batch, seq_len)
        units = self.gru.hidden_size
        E = self.embedding(x)                # -> (batch, seq_len, embed_dim)
        predseq, _ = self.gru(E)             # -> (batch, seq_len, 2*units)
        
        # Now predseq has shape [batch_size, seq_len, 2*units],
        # where the first half of the last dimension is for the forward pass
        # and the second half of the last dimension is for the backward pass.
        # Compare to (4) and (5) from rnn.pdf
        
        # YOUR CODE HERE to create tensors forward and backward of shape (batch, seq_len-2, units), ~4 lines
        batch = x.shape[0]
        #print("batch")
        #print(batch)
        seq_len = x.shape[1]
                
        forward = predseq[:batch, :seq_len-2, :units]
        predseq_flipped = torch.flip(predseq, dims=[1]) 
        backward = predseq[:batch,:seq_len-2, :units] 
        
        
        #forward = predseq[:batch, :seq_len-2, :units]   # (batch, seq_len-2, units)
        #backward = predseq.flip(dims=[0])[:batch,:seq_len-2, :units]#  # (batch, seq_len-2, units)
        # YOUR CODE ENDS
        # Concatenate the forward and backward hidden states again so that
        # at time index t we have hidden states that can legitimately be used to 
        # predict character t without 'cheating', i.e. without having used x_t to compute it.
        fb = torch.cat([forward, backward], dim=-1)    # -> (batch, seq_len-2, 2*hidden_units)
        
        # The prediction of the s probabilities for each character and position is done
        # as before - with a vanilla fully-connected NN layer.
        # Note: during training we use CrossEntropyLoss so no explicit softmax here.
        logits = self.fc(fb)                           # -> (batch, seq_len-2, s)
        return logits

I really did try to solve this Task. I am right now stuck at a point where i have no Idea wheater this cold be improved.

And i did “solve” this by patience and trying maybe someone could held me to understand the Data Structure (Dimensions and the slicing)

For the Parameters, i am working on them and trying but i do believe my implementation of the forwarding is the reason my model does not perform as expected.

As you can imagine this is a training assignment and i really would like to understand how this should be implemented (forwarding). Please feel free to ask for additional Information, if necessary.