Difference between GPU memory usage pattern in training/inference

During a typical training step for a neural network model, it seems like we need to keep all the activation we have so that we can easily calculate the gradient when doing the backward step. However, for inference (say, under torch.no_grad() or torch.inference_mode()), does this statement holds true? It seems like at any moment during a forward step, we only need to keep track of the activations that are relevant for future layers, and memory that are reserved for previous activation can be freed or reused.