Pytorch's getting slower batch by batch within one epoch

Hi
I am using LSTM to deal with sequences (sequence to sequence model). In my case the whole training set contains about 7000 sequences with variable length, so I use dataset and dataloader classes to feed my data into the model, with batch_size equal to one.
Now the problem I have is, the processing time is not equal for each batch, it is gradually increasing, for example, in the beginning, processing 50 sequences neededonly 1 minute but after 2000 sequences, processing 50 sequences took about half hour, and the time keeps increasing now, I don’t know what causes this problem? I also noticed that the memory consumption was also increasing, which should not happen. Any ideas?
Thanks!

1 Like

If the memory consumption is gradually increasing, I reckon you are saving/generating some variables between each epoch without releasing them. Check the code carefully If it’s not the case

Try the following on platform 1:
torch.backends.cudnn.enabled = False

Also separately maybe try:
torch.backends.cudnn.benchmark = True

This might help

Thank you, Devansh! I will check them.
You remind me that when I use dataset and dataloader, for each batch, I generated one variable for sequence and one variable for label. I don’t explicitly release them, and I haven’t seen any examples of doing that. I think it might be the problem. Do you know how to release them please?

If you continuously accumulate the results of variables, you never frees your computation graph.
A common error is to accumulate the total loss for an epoch/training as follows:

total_loss += loss

This actually holds your variables, avoiding you to ever free them. In this case, you should do instead (because you don’t want to backprop through the total_loss, only print its value)

total_loss += loss.data[0]
1 Like

Thank you Francisco.
But I didn’t do that. Below is part of my code

for batch_idx, batch_sample in enumerate(dataset_loader_train):
    inputs, label = batch_sample['seq'], batch_sample['label']
    if USE_CUDA:
        inputs, label = inputs.cuda(), label.cuda()
    label = label.view(inputs.size()[1])
    inputs = inputs.view(inputs.size()[1], 1, -1)
    inputs, label = Variable(inputs), Variable(label)
    optimizer.zero_grad()
    outputs = model(inputs)
    loss = criterion(outputs, label)
    loss.backward(retain_graph=True)
    optimizer.step()

Then maybe somewhere in your model you are doing something similar, holding into tensors that keeps history and thus when you backprop you have huge graphs that increase over time.

I couldn’t spot anything abnormal… Below is my model

  class PredRNN(nn.Module):
  def __init__(self, rnn_type, input_size, hidden_size, output_size, n_layers=1, dropout=0.0,
             bidirectional=False):
    super(PredRNN, self).__init__()
    self.input_size = input_size
    self.hidden_size = hidden_size
    self.output_size = output_size
    self.n_layers = n_layers
    self.rnn_type = rnn_type
    self.dropout = dropout
    if bidirectional:
        self.multiply = 2
    else:
        self.multiply = 1
    if self.rnn_type == 'lstm':
        self.rnn = nn.LSTM(self.input_size, self.hidden_size, self.n_layers, dropout=self.dropout)
    elif self.rnn_type == 'gru':
        self.rnn = nn.GRU(self.input_size, self.hidden_size, self.n_layers, dropout=self.dropout)
    else:
        self.rnn = nn.LSTM(self.input_size, self.hidden_size, self.n_layers, dropout=self.dropout)

    self.rnn2out = nn.Linear(self.hidden_size * self.multiply, self.output_size)
    self.hidden = self.init_hidden()

def forward(self, inputs):
    output, self.hidden = self.rnn(inputs, self.hidden)
    output = self.rnn2out(output.view(len(inputs), -1))
    output = F.log_softmax(output)
    return output

def init_hidden(self):
    if self.rnn_type == 'lstm':
        h0 = Variable(torch.zeros(self.n_layers * self.multiply, 1, self.hidden_size))
        c0 = Variable(torch.zeros(self.n_layers * self.multiply, 1, self.hidden_size))
        if USE_CUDA:
            hidden = (h0.cuda(), c0.cuda())
        else:
            hidden = (h0, c0)
    elif self.rnn_type == 'gru':
        hidden = Variable(torch.zeros(self.n_layers * self.multiply, 1, self.hidden_size))
        if USE_CUDA:
            hidden = hidden.cuda()
    else:
        h0 = Variable(torch.zeros(self.n_layers * self.multiply, 1, self.hidden_size))
        c0 = Variable(torch.zeros(self.n_layers * self.multiply, 1, self.hidden_size))
        if USE_CUDA:
            hidden = (h0.cuda(), c0.cuda())
        else:
            hidden = (h0, c0)

    return hidden

I don’t know much about RNNs, but your model looks fine.
After a second glance at your code, did you try removing the retain_graph=True from your backward pass? It is only necessary if you are doing double backprop or some other tricks.
I’m not sure if that’s the reason or not.
If the issue still persists, could you try creating a small example (~100 lines) that reproduces your problem?
Thanks!

Actually, this is something strange, because I haven’t seen any other examples with this retain_graph=True, but if I remove it, it prompts an error:
RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.
I will try your advice. Thank you Francisco!

Hi, it is solved based on the idea in https://github.com/pytorch/pytorch/issues/2769
what I did in my model is
output, self.hidden = self.rnn(inputs, (self.hidden[0].detach(), self.hidden[1].detach()))

I have no idea why this makes a huge difference…
Thanks!

5 Likes