Pre-allocate memory in case of variable input length?

In pytorch tutorial, I found this article.

It says, if you have dataset of variable lengths then pre-allocating memory with the maximum length of input can help avoiding OOM error.
Pre-allocation of memory can be done by the following steps:

  1. generate a (usually random) batch of inputs with maximum sequence length (either corresponding to max length in the training dataset or to some predefined threshold)
  2. execute a forward and a backward pass with the generated batch, do not execute an optimizer or a learning rate scheduler, this step pre-allocates buffers of maximum size, which can be reused in subsequent training iterations
  3. zero out gradients
  4. proceed to regular training

My question is, should I do the step 1-3 at the beginning of every iteration? or only once at the beginning of the training code?

Also, is it related to this fragmentation problem?

You should do it once before the actual training starts, as the memory would be pre-allocated and moved to the cache afterwards. As long as you don’t clear the cache via torch.cuda.empty_cache() you wouldn’t have to rerun it.