Implementing multiple recomputations on top of `torch.utils.checkpoint`

I haven’t looked much into PyTorch implementations for swapping, but I do recall there may have been some work on implementing something similar to TensorFlow Large Model Support (TFLMS), which was a swapping solution for TensorFlow.

A quick google search gave me this thread: Thoughts on use of CPU ram as a swap for GPU

Original TFLMS Paper: https://arxiv.org/pdf/1807.02037.pdf