Handle varying data length/Effect of zero padding input

I am trying to train some video data that has a highly variable sequence length, which could be anything from 500 to 3460 frames. My model includes GCN, CNN, and Linear/GRU layers. I am confused about which is a better approach

- Zero pad to the maximum frame length (say, 3500). Does it matter if it is in the front or back?
- Repeat frames to maximum frame length

I am working on action quality assessment so I am assuming repeating is not a good idea as samples with a lower number of frames will have more repeats of the same movement than there actually were which will change the nature of the action. But if I am zero-padding how will it affect the quality prediction. Like some very good quality action could be very small/large in length and so can be very poor ones.

Or

- Figure out a way to work with mismatched length

Besides, as you can see sequences are very long, what could be the best course of action to better capture temporal relationships.
Thanks for any suggestions