Training gets slow down by each batch slowly

No if a tensor does not requires_grad, it’s history is not built when using it. Note that you cannot change this attribute after the forward pass to change how the backward behaves on an already created computational graph. It has to be set to False while you create the graph.

2 Likes

I’m experiencing the same issue with pytorch 0.4.1
I implemented adversarial training, with the cleverhans wrapper and at each batch the training time is increasing.
How can I track the problem down to find a solution?

1 Like

Hi :sweat_smile: Why does the the speed slow down when generating data on-the-fly(reading every batch from the hard disk while training)? Does that continue forever or does the speed stay the same after a number of iterations?

I observed the same problem. The solution in my case was replacing itertools.cycle() on DataLoader by a standard iter() with handling StopIteration exception. You can also check if dev/shm increases during training.

1 Like

Have the same issue:

0%| | 0/66 [00:00<?, ?it/s]
2%|▏ | 1/66 [05:53<6:23:05, 353.62s/it]
3%|▎ | 2/66 [06:11<4:29:46, 252.91s/it]
5%|▍ | 3/66 [06:28<3:11:06, 182.02s/it]
6%|▌ | 4/66 [06:41<2:15:39, 131.29s/it]
8%|▊ | 5/66 [06:43<1:34:15, 92.71s/it]
9%|▉ | 6/66 [06:46<1:05:41, 65.70s/it]
11%|█ | 7/66 [06:49<46:00, 46.79s/it]
12%|█▏ | 8/66 [06:51<32:26, 33.56s/it]
14%|█▎ | 9/66 [06:54<23:04, 24.30s/it]
15%|█▌ | 10/66 [06:57<16:37, 17.81s/it]
17%|█▋ | 11/66 [06:59<12:09, 13.27s/it]
18%|█▊ | 12/66 [07:02<09:04, 10.09s/it]
20%|█▉ | 13/66 [07:05<06:56, 7.86s/it]
21%|██ | 14/66 [07:07<05:27, 6.30s/it]

Cannot understand this behavior… sometimes it takes 5 minutes for a mini batch or just a couple of seconds.

  • my first epoch took me just 5 minutes.

94%|█████████▍| 62/66 [05:06<00:15, 3.96s/it]
95%|█████████▌| 63/66 [05:09<00:10, 3.56s/it]
97%|█████████▋| 64/66 [05:11<00:06, 3.29s/it]
98%|█████████▊| 65/66 [05:14<00:03, 3.11s/it]

1 Like

It turned out the batch size matters. So, my advice is to select a smaller batch size, also play around with the number of workers.

Hi, Could you please inform on how to clear the temporary computations ?

thanks,

You should not save from one iteration to the other a Tensor that has requires_grad=True. If you want to save it for later inspection (or accumulating the loss), you should .detach() it before. So that pytorch knows you won’t try and backpropagate through it.

The answer comes from here - Why the training slow down with time if training continuously? And Gpu utilization begins to jitter dramatically?

I used torch.cuda.empty_cache() at end of every loop

My nn.Module had a variable which seems to be outside of the training loop but accumulates gradient across loops like this:

class Foo(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.variable = torch.tensor([1., 2.], requires_grad=True)
        self.bad_variable_used_across_loop = torch.tensor([-1.])
    def forward(self, x):
        self.bad_variable_used_across_loop = x @ self.variable + self.bad_variable_used_across_loop
        some_result = x @ self.variable + self.bad_variable_used_across_loop
        return some_result

Here I make bad_variable_used_across_loop an attribute of Foo only to record the value of for further use. But this variable keeps gradient flow through across batch!
To solve this, add model.bad_variable_used_across_loop.detach() at the end of each training loop.

model = Foo()
for step in range(100000):
    start = time.time()
    x = torch.randn([10, 2])
    loss = model(x).sum()
    loss.backward()
    end = time.time()
    model.bad_variable_used_across_loop.detach() # detach it
    print(f'step {step:05d}: {end-start:.2f}s')

Hi,

Small note:
If your “variable” is a learnt parameter, it should be an nn.Parameter.

Hi, similar issue occurs to me while training, but my problem is that after I stop the process, load latest model and continue training, the speed becomes normal. I know my problem can be fixed by restarting the process, but just wondering why and how can I solve the problem. I am using 2080 ti and this happens both while training YOLO and my customized model.
Thank you in advance.

Hi,

Given what you describe, I guess the issue is with the way to initialize your Parameters when there is no checkpoint to load.

hey, could you please explain more about the usage .detach() a bit more in the case of accumulating of loss. Sorry if my question is too basic, I’m still a new hand to PyTorch.

epoch_loss = 0
n_train = len(train_loader)
 with tqdm(total=n_train, desc=f'Epoch {epoch + 1}/{epochs}', unit='img') as pbar:
      for batch in train_loader:
          net.train()
          imgs = batch['image']
          true_masks = batch['labels']
          imgs = imgs.to(device=device, dtype=torch.float32)
          mask_type = torch.float32 if net.n_classes == 1 else torch.long
          true_masks = true_masks.to(device=device, dtype=mask_type)
          logits,probs,masks_pred = net(imgs)
          logits = torch.squeeze(logits,1)
          loss = criterion(logits, true_masks)
          epoch_loss += loss.item()/n_train
          optimizer.zero_grad()
          loss.backward()
          optimizer.step()
          # to be continued 

If I understood it correctly in this case, epoch_loss = 0, wouldn’t let the loss being saved from one iteration to the another. And, this is called .detach().
Thank u.

Hi,

In this case, you don’t need to explicitly call .detach() because you extract a python number directly with .item() and so this breaks the graph.

1 Like

hi, thanks for your explanation. It makes sense to me now.

Hey, I am facing the same problem where the training speed for each batch grows within an epoch. I dont quite understand how did you solve this problem?

2 Likes

set pin_memory=True solved my problem

Could you elaborate on that? I believe I am facing a similar issue but dont know how to solve it

just run into this issue on some code where each epoch ~50k minibtaches are processed for training.
processing an epoch starts at ~13.6it/s and ends up with ~6it/s.
after some digging, it turns out, this slowness is caused by this operation that is executed every minibatch:

stats = stats + new_list

new_list holds 32 values. unfortunately, this is a very expensive operation since it will create a new list for stats. the cost depends on the size of the list. this stats is reset every epoch, but, it still grows quickly during iteration over minibatches. toward the end of an epoch, stats holds about 1.6 million elements.
adding new elements should be done directly on the existing list stats using in-place operation such as:

stats += new_list
stats.extend(new_list)
stats.append(1.)

these operations wont create a new list.
using the second method stats.extend(new_list) maintains the processing time of each minibatch at the same level: ~13.6it/s during the entire epoch.
note: all values are detached, and on cpu.