Hi,
I have to use a nn.GRU for Character Prediction in short sentences. These Characters might be at any Place inside a String. The GRU has to be bidirectional and I do have to solve a few Tasks.
1: Split the output into a forward hald and a backward halt inside the forwarding of the model.
2: Train using the parameters: Epoch and unit (hiddenunits in Model) to reach a Cross Entropy Loss of below 0.2
I did try to find a Solution but to be honest i neither understand the Structure of the Output nor i do unterstand how to slice and concenate for a bidirectional approach For the first Task everything should be done using 4 lines and inside this code:
def forward(self, x):
# x shape: (batch, seq_len)
units = self.gru.hidden_size
E = self.embedding(x) # -> (batch, seq_len, embed_dim)
predseq, _ = self.gru(E) # -> (batch, seq_len, 2*units)
# Now predseq has shape [batch_size, seq_len, 2*units],
# where the first half of the last dimension is for the forward pass
# and the second half of the last dimension is for the backward pass.
# Compare to (4) and (5) from rnn.pdf
# YOUR CODE HERE to create tensors forward and backward of shape (batch, seq_len-2, units), ~4 lines
batch = x.shape[0]
#print("batch")
#print(batch)
seq_len = x.shape[1]
forward = predseq[:batch, :seq_len-2, :units]
predseq_flipped = torch.flip(predseq, dims=[1])
backward = predseq[:batch,:seq_len-2, :units]
#forward = predseq[:batch, :seq_len-2, :units] # (batch, seq_len-2, units)
#backward = predseq.flip(dims=[0])[:batch,:seq_len-2, :units]# # (batch, seq_len-2, units)
# YOUR CODE ENDS
# Concatenate the forward and backward hidden states again so that
# at time index t we have hidden states that can legitimately be used to
# predict character t without 'cheating', i.e. without having used x_t to compute it.
fb = torch.cat([forward, backward], dim=-1) # -> (batch, seq_len-2, 2*hidden_units)
# The prediction of the s probabilities for each character and position is done
# as before - with a vanilla fully-connected NN layer.
# Note: during training we use CrossEntropyLoss so no explicit softmax here.
logits = self.fc(fb) # -> (batch, seq_len-2, s)
return logits
I really did try to solve this Task. I am right now stuck at a point where i have no Idea wheater this cold be improved.
And i did “solve” this by patience and trying maybe someone could held me to understand the Data Structure (Dimensions and the slicing)
For the Parameters, i am working on them and trying but i do believe my implementation of the forwarding is the reason my model does not perform as expected.
As you can imagine this is a training assignment and i really would like to understand how this should be implemented (forwarding). Please feel free to ask for additional Information, if necessary.