Do you have examples of the usage of torch.utils.checkpoint

I have read this part in the docs. For fear that I might have misunderstood it, I feel I need to ask: Does this feature mean to reduce memory usage by not saving some intermiddle results of the forward path and recompute them when needed in the backward path? If so, would you please show us some example of how to use it in the training process ?

Yes, it works by recomputing some intermediate values rather than storing them, which results in a decrease in memory usage, but also a slight increase in execution time.
A nice example is the efficient densenet.

Is there any skill to use this, I use it to run about 1/3 of my model (wrapped in a subclass of nn.Module), but only see a little bit memory usage decline.

1 Like