Time/Memory keeps increasing at every iteration

suraj.pt · February 9, 2021, 8:03pm

How are you measuring time?

If each new iteration is taking longer, first make sure you’re measuring run time accurately How to measure execution time in PyTorch?. Without properly synchronizing GPU tasks, your measurements will be inaccurate.

Using `detach()` to reduce Autograd operations

At every iteration of your network training, PyTorch constructs a computational graph of all the operations dynamically. This graph contains tensors that require gradients. When we call backward on a tensor (eg: loss), Autograd backpropagates through the graph that calculated the tensor and frees up the graph to save memory.

In your code, you might be creating some tensors that don’t require gradients (eg: hidden states across time steps in an RNN). If they continue to be part of the computational graph Autograd will backpropagate through them as well, resulting in increased time or memory consumption. Using detach() creates a view of this tensor that does not require gradients. Refer to this example illustrating detach().

A common error is to reference a tensor outside the training loop; this prevents the graph from freeing up after every iteration. Instead, use item() to just store the scalar value (or detach() if you need it as a tensor).

losses = []
for epoch in range(10):
    pred = model(x)
    loss = loss_fn(y, pred)
    loss.backward()
#   losses.append(loss)  # This accumulates history. AVOID!
    losses.append(loss.item())  # Do this instead

Data Loading bottleneck

Data loading is a CPU process, and can be a significant bottleneck (Data loader takes a lot of time for every nth iteration) . Take a look at @rwightman’s excellent set of tips to make loading data faster How to prefetch data when processing with GPU? - #19 by rwightman.

Time/Memory keeps increasing at every iteration

How are you measuring time?

Using detach() to reduce Autograd operations

Data Loading bottleneck

Using `detach()` to reduce Autograd operations