What is the best number of input size for GRU in MLP-GRU model?

I am doing action recognition. Each of my frames has a size of [3, 50] and I feed them into the MLP whose output is then fed to my GRU. I wonder if the input_size to my GRU (or the output of the MLP) should be [3x50].

Thank you in advance.