Merely instantiating a bunch of LSTMs on a CPU device seems to allocate memory in such a way that it’s never released, even after gc.collect(). The same code run on the GPU releases the memory after a
torch.cuda.empty_cache(). I haven’t been able to find any equivalent of
empty_cache() for the CPU.
Is this expected behavior? My actual use-case involves training several models at once on CPU cores in a Kubernetes deployment, and involving LSTMs in any way fills memory until the Kubernetes OOM killer evicts the pod. The models themselves are quite small (and if I load a trained model in, they take up very little memory), but all memory temporarily used during training stays filled once training is done.
import torch import torch.nn as nn import gc device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu') throwaway = torch.ones((1,1)).to(device) # load CUDA context class Encoder(nn.Module): def __init__(self, input_dim, hidden_dim, n_layers, dropout_perc): super().__init__() self.hidden_dim, self.n_layers = (hidden_dim, n_layers) self.rnn = nn.LSTM(input_dim,hidden_dim,n_layers,dropout=dropout_perc) def forward(self,x): outputs, (hidden, cell) = self.rnn(x) return hidden, cell pile= for i in range(500): pile.append(Encoder(102,64,4,0.5).to(device)) del pile gc.collect() if torch.cuda.is_available(): torch.cuda.empty_cache()
I’m running PyTorch 1.5.1 and Python 3.8.3 on Ubuntu 18.04 LTS.