Ascending CPU Memory during training of LSTM

I am faced with memory problem when I use pytorch to implement certain network. To locate the problem, I check the memory of a very simple single LSTM network and found the memory for each epoch would increase for early turns, while for the same LSTM network in TensorFlow, the memory keeps the same (and the similar size of the first epoch’s memory by LSTM).
Moreover, the problem would become more serious with the grow of hidden size.
Could someone help me find out what happened? Thanks a lot!

The assigned pictures shows the detail.
By the way, no worry for the difference in loss, cuz I didn’t average for all batches in the TensorFlow implementation.

Thanks for your attention again!

Code for training step:

for epoch in range(num_epochs):
            start = time.time()
            i = 0
            loss_sum = 0
            total_round = 0
            while (i < self.train_size):
                self.rnn.train()
                self.optim.zero_grad()
                batch_end = i + batch_size
                if (batch_end >= self.train_size):
                    batch_end = self.train_size
                var_x = self.to_variable(x_train[i: batch_end])
                var_y = self.to_variable(y_train[i: batch_end])
                var_y_seq = self.to_variable(y_seq_train[i: batch_end])
                if var_x.dim() == 2:
                    var_x = var_x.unsqueeze(2)
    
                y_res = self.rnn(var_x)
                var_y = var_y.view(-1,1)
                loss = self.loss_func(y_res, var_y)
                loss.backward()
                self.optim.step()
                loss_sum += loss.detach().numpy()
                i = batch_end
                total_round += 1
            end = time.time()
            print('epoch [{}] finished, the average train loss is {}, with time: {}'.format(epoch, loss_sum/total_round, end-start))
            
            memory_usage()
            
            train_loss_list.append(loss_sum/total_round)
            
            start = time.time()
            self.rnn.eval()
            
            test_y_res = self.rnn(var_x_test)
            test_loss = self.loss_func(test_y_res, var_y_test)
            test_loss_list.append(test_loss.detach().numpy())
            end = time.time()
            print('the average test loss is {}, with time: {}'.format(float(test_loss), end-start))

Code for my rnn model:

class my_rnn(nn.Module):

    def __init__(self,input_size,hidden_size,time_step):
        super(my_rnn,self).__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.T = time_step

        self.lstm = nn.LSTM(input_size=input_size,hidden_size=hidden_size,num_layers=1,batch_first = True)
        self.linear = nn.Linear(hidden_size,1)
    
    def forward(self,driving_x):
        lstm_out, hidden = self.lstm(driving_x)
        output = self.linear(lstm_out[:,-1,:])
        return output