GRU for Multi-dimensional Input

Hello, I have an input of shape (14, 10, 30, 300),
where, 14 is batch_size, 10 is the seq_len, 30 is the num_tokens in each element in the sequence, and 300 is the embedding_dim for each token. (I know, it’s complicated!)

I want to process each of the 10 elements in the sequence through a GRU and convert it to a 512 dimensional output. i.e., I wanna take each 30 x 300 in the sequence and learn to map it to a 512 dimensional vector.

Here’s what I have. I’m using the output of gru’s last hidden layer. That’s because my input represents textual features and I’ll be combining it with visual features later. Would love some feedback on whether this is the correct approach. Thank you.

gru = nn.GRU(
    input_size=300,
    hidden_size=256,
    batch_first=True,
    bidirectional=True
)

# Reshape input from (14, 10, 30, 300) -> (14*10, 30, 300)
inputs = inputs.reshape(14*10, 30, 300)
_, final_hidden_layer = gru(inputs)
outputs = final_hidden_layer.permute(1, 0, 2).contiguous().flatten(start_dim=1) # (140, 512)
outputs = outputs.reshape(14, 10, 512)

@ptrblck @ParGG

The GRU, with bath_first=True takes a sequence with dimensions (BatchSize, SequenceLengh, InputFeatures): GRU — PyTorch 1.12 documentation. If you want to map your 30 x 300 to 512. Then you need to reshape your input as inputs = inputs.reshape(14, 10, 30 * 300).

I would suggest you to check the documentation about what the two output of rat GRU represent. Once you will have multiple layers, it might not match your expectations.