High memory usage while training

Why we add subgraphs to its history? Is it because loss still requires grad after loss.backward()?