How are you measuring time?
If each new iteration is taking longer, first make sure you’re measuring run time accurately How to measure execution time in PyTorch?. Without properly synchronizing GPU tasks, your measurements will be inaccurate.
detach() to reduce Autograd operations
At every iteration of your network training, PyTorch constructs a computational graph of all the operations dynamically. This graph contains tensors that require gradients. When we call backward on a tensor (eg:
loss), Autograd backpropagates through the graph that calculated the tensor and frees up the graph to save memory.
In your code, you might be creating some tensors that don’t require gradients (eg: hidden states across time steps in an RNN). If they continue to be part of the computational graph Autograd will backpropagate through them as well, resulting in increased time or memory consumption. Using
detach() creates a view of this tensor that does not require gradients. Refer to this example illustrating
A common error is to reference a tensor outside the training loop; this prevents the graph from freeing up after every iteration. Instead, use
item() to just store the scalar value (or
detach() if you need it as a tensor).
losses =  for epoch in range(10): pred = model(x) loss = loss_fn(y, pred) loss.backward() # losses.append(loss) # This accumulates history. AVOID! losses.append(loss.item()) # Do this instead
Data Loading bottleneck
Data loading is a CPU process, and can be a significant bottleneck (Data loader takes a lot of time for every nth iteration) . Take a look at @rwightman’s excellent set of tips to make loading data faster How to prefetch data when processing with GPU? - #19 by rwightman.