Is there a way to perform to(device) for optimizer state variables?

Hi all,

Owing to limited memory on CUDA, I have a need to bring the last few batches (batches are ordered according to size of input text) from CUDA to cpu. I use the to(device) function to get my model params and some other tensors. However, the optimizer has a cached state (variables different from referenced model parameters) that still remain on CUDA memory. Is there any easy way to bring the optimizer state from CUDA to CPU? I believe we dont have a to(device) functionality for optimizer yet. I could manually bring all tensors in the optimizer state back to cpu, but was hoping if someone knew of an easier way that I missed.