Training time increasing with steps

Nemfor · November 11, 2021, 2:30pm

Hello! I have the next problem: my torch model slows down by the end of an epoch and starts to perform well in a new epoch.
So, I use tqdm to measure iter/second performance and have the next picture: at start of training performance is about 20 iter/sec, but it slows down with increasing iteration number and finished with about 4it/sec. In new epoch it start again from 20 and return to 4 by the end. I remove all from training loop, now that is like in pytorch tutorial, no any additional steps.
Use Dataloader with num_workes>1 + pin_memory, I have about 250 GB of data, which pass to Loader init, so my Dataset just returns index, no any loading data from disk and no transformations, all data stored in init in the way it passed to forward.
How can i find this bottleneck of my model?

ptrblck · November 12, 2021, 6:47am

Could you check if you are appending some tensors to a list and could thus be storing the computation graphs throughout the epoch? This should be visible in an increased memory usage in the first epoch. Also, are you using backward(retain_graph=True) or any other “special” setup?

Nemfor · November 12, 2021, 6:57am

Thanks for the answer!
I haven’t seen anything strange in the training loop:

    with tqdm(total=len(train_generator)) as prbar:
        for batch_idx, batch in enumerate(train_generator, 1):
            output = self._model(batch[:3]).squeeze(dim=1)
            target = batch[3].to(self._device)
            loss = loss_func(output, target)
            loss.backward()
            optimizer.step()
            optimizer.zero_grad()

            total_loss += float(loss.item()) * target.size(0)
            y_true.extend(list(target.data.cpu().numpy()))
            y_pred.extend(list(output.data.cpu().numpy()))
            l1_part += float(l1_weight * weights_norm_1 / float(loss) * 100)
            l2_part += float(l2_weight * weights_norm_2 / float(loss) * 100)

Can any things be in the impelemnted model class that can slows the perofmance?

ptrblck · November 12, 2021, 7:14am

Where are these values or tensors coming from?

            l1_part += float(l1_weight * weights_norm_1 / float(loss) * 100)
            l2_part += float(l2_weight * weights_norm_2 / float(loss) * 100)

Could you check if some of them are attached to a computation graph and would thus have a valid .grad_fn?
If so, detach() them before accumulating.

Yes, if your model increases the workload in each iteration e.g. by using longer sequences or by backpropagating through all previous iterations, you would see a slowdown.

Nemfor · November 12, 2021, 7:18am

This parts is just additional regularization + sample weights:
if use_weights:
weights = batch[4].to(self._device)
loss = (loss * weights).mean()
if l1_weight:
weights_norm_1 = 0
for i, params in enumerate(self._model.parameters(), 0):
if i != 0: # not embedding layer
weights_norm_1 += torch.norm(params, 1)
loss = loss + l1_weight * weights_norm_1
Where can I use .grad_fn to understand if they have childs?

List item

ptrblck · November 12, 2021, 7:21am

You can print it directly as it’s a tensor attribute:

print(loss.grad_fn)
print(l1_weight.grad_fn)
...

Nemfor · November 12, 2021, 7:22am

Thanks, will try to do this.
We have two similar models (torch/keras) and now keras performs two times faster (keras ~ 23 min per epoch/ torch ~ 40-50 min) that looks a bit strange.