Best practices to train RNN using GPU

Problem

I tried to train my RNN on Google colab (which has free GPU to use). But it seems that the runtime does not seem to decrease much, which makes me doubt whether I have done correct things when processing data. Specifically

  • Is it true only when data is sent in batches could the runtime be reduced? Otherwise, when data is sent into network one after another, there is no hope there could be speedup.
  • If the first point is true. Does this mean that the data fed in has to have same input dimension.
    For example. say I would like to pass words {“my”, “name”, “is”} into network for some classification task and I decide to use one-hot encoding. Then “my” and “is” should be padded to the same length as “name”.

Thank you for any input.