Allocates memory and never free it when using Module.cpu()

I create many modules while doing federated learning, and save them in the RAM(cpu), only move to cuda when training and evaluating.

Noticed that the memory was not freed when moving the model to cuda.

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
    47   3719.9 MiB   3719.9 MiB           1       @profile
    48                                             def train(self):
    49   3719.9 MiB      0.0 MiB           1           self.model = self.model.to(Config.device)
    50   3719.9 MiB      0.0 MiB           1           optimizer = torch.optim.Adam(self.model.parameters(), lr=Config.learning_rate)
    51   3719.9 MiB      0.0 MiB           1           self.model.train()

Move the model back to cpu when training is complete

69   3719.9 MiB      0.0 MiB           6               for epoch in range(epochs):
    70   3719.9 MiB      0.0 MiB           5                   batch_loss_list = []
    71   3719.9 MiB      0.0 MiB          60                   for data in self.loader:
    72   3719.9 MiB      0.0 MiB          55                       x = data[0].to(Config.device)
    73   3719.9 MiB      0.0 MiB          55                       y = data[1].to(Config.device)
    74   3719.9 MiB      0.0 MiB          55                       loss, y_ = self.train_batch(x, y)
    75   3719.9 MiB      0.0 MiB          55                       batch_loss_list.append(loss)
    76   3719.9 MiB      0.0 MiB           5                   mean_loss = np.mean(batch_loss_list)
    77   3719.9 MiB      0.0 MiB           5                   if mean_loss < Config.local_loss_threshold:
    78                                                             break
    79   3719.9 MiB      0.0 MiB           5                   self.schedule.step()
    80   3719.9 MiB      0.0 MiB           5                   self.logger.log_client_loss(self.client_id, epoch, np.mean(batch_loss_list).item())
    81   3721.7 MiB      1.8 MiB           1           self.model = self.model.cpu()
    82   3721.7 MiB      0.0 MiB           1           return loss

Each times move the model from cuda to cpu, 1.6-1.9MiB RAM would be allocated.
In the end, it consumed all my memory and crashed the program