How to fix profiler "assert not duplicate" error?

Hi,

I am testing out the torch.profiler for the first time with BERT fine-tuning

def trace_handler(prof):
    timestamp = datetime.now().strftime("%b_%d_%H_%M_%S")
    prof.export_chrome_trace(f"trace_{timestamp}.json")
    prof.export_memory_timeline(f"memory_{timestamp}.html", device="cuda:0")

training_losses = []

progress_bar = tqdm(range(num_training_steps))

active_steps = 10
repeat_cycles = 1  
total_profiled_steps = active_steps * repeat_cycles 
current_step = 1

model.train()
with profile(
    activities=[
        ProfilerActivity.CPU,
        ProfilerActivity.CUDA
    ],
    schedule=torch.profiler.schedule(wait=0, warmup=0, active=active_steps, repeat=repeat_cycles),
    on_trace_ready=trace_handler,
    record_shapes=True,
    profile_memory=True,
    with_stack=True,
) as prof:
    for batch in train_dataloader:
        if current_step >= total_profiled_steps:
            break
        prof.step()  
        current_step += 1

        batch = {k: v.to(device) for k, v in batch.items()}
        outputs = model(**batch)
        loss = outputs.loss
        loss.backward()

        optimizer.step()
        lr_scheduler.step()
        optimizer.zero_grad()
        progress_bar.update(1)

        training_losses.append(loss.item())

I only wanted to run the profiler for a few steps and stop, but I always got this error at the last step

The last assertionError sometimes changes (atten:sum, atten:to, etc), but the first one would always be the ‘assert not duplicate’ line. I couldn’t really find any similar errors online, so not sure what is happening.

I am using Google Colab (GPU T4) with batch_size=4 and fine-tuning on the task of multiple choices. I have tried changing batch sizes, using torch.cuda.synchronize(), changing the active_steps numbers, but the error persisted.

Does anyone know what is happening, and where should I read more about it?

Does this reproduce if you only run a single iteration? What about just two iterations?
Do you have a self-contained repro that can reproduce this?

When trying to reproduce the error with different iterations:

  • If set active_steps <= 3, then the error does NOT occur, otherwise it will happen from 4 onwards
  • If set active_steps > 3 (for example, with 10), and I try to break after a few iterations, then the error occurs from step 3 onwards (breaking at 1 or 2 iterations seems fine)

I tried reproducing the same error with a simpler dataset (just randomize some numbers) and model (2-layer neural network), but it did not occur. So I do not have a small and concise self-contained repro that can reproduce this.

The full code is for fine-tuning BERT on multiple choices, which is longer and uses a larger dataset. I’m unsure whether it is the dataset (COPA dataset) problem or the huggingface fine-tuning process problem. Do you have any suggestions for debugging this further?

Thanks for trying to making a smaller reproducer. I’m not too sure what is going on unfortunately, but it seems unexpected, and we should at a mininum improve the error message.