Does it load the whole data into GPU?

In the tutorial of pytorch, http://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html#, in the program:
def trainIters(…):

training_pairs = [variablesFromPair(…]
Because it have use .cuda(), so does it means that the program have load the whole data into GPU? it will need memory as much as n_iters * (length sentence). Is it the usual way?