How to propely unsort an unpacked sequence to get hidden states also

In my network, I’m using packing for variable-length sequence inputs for the gru.
please see code below
. I still have my indices from sorting (variable ‘order’)
I’m doing the unpacking and unsorting as in the following code :

  new_s , new_s_lengths = nn.utils.rnn.pad_packed_sequence(s) # s is the PackedSequence
    output = unscramble(new_s , new_s_lengths, order, batch_size)

The function unscramble does the unsorting stuff and is defined as follows

def unscramble(output, lengths, original_indices, batch_size):
Takes the output from the model, the lengths, and original_indices, and batch size.
Unscrambles the data, which had been sorted to make pack_padded_sequence work. 
Returns the unsscrambled and unpadded outputs. 
final_ids = (Variable(torch.from_numpy(np.array(lengths) - 1))).view(-1,1).expand(output.size(1),output.size(2)).unsqueeze(0)
if cuda:
    final_ids ='cuda')
final_outputs = output.cpu().gather(0, final_ids.cpu()).squeeze()
unscrambled_outputs = final_outputs[original_indices]
return unscrambled_outputs

However this is not my problem, the problem this function returns only the output of the last hidden state of the GRU, let’s say I have defined a GRU in the following way:

self.lstm = nn.GRU(300 , 400 ,  1)

and having an input of shape : [48 , 128 , 300] (48 is the longest sequence and 128 is the batch size) so after packing the sequence and unpacking and calling unscramble function which I have defined I will get a new tensor of shape : [128 , 400] (the last hidden state with the original indices as before sorting by lengths).
my question here is how can I get the output of all of the hidden states? (in my example a tensor of shape [128 , 48 , 400] .