Torch.cuda.amp.autocast memory leak

We experienced a memory leak issue, when using the autocast functionality.

The below code resulted in the memory leak, but without the autocast it worked as expected.
Replacing the loss.item() with loss.float() solved the issue for us. Removing the memory leak when using the autocast

epoch_loss = 0
for input, output in iter(dataloader):
            with autocast():
                   output = model(input)
            loss = loss_fn(output, target)
           epoch_loss += loss.item()

Pytorch version 1.10.0
We spend some days on finding the issue, so hopefully, this will help others solve their problem faster.

Could you provide a minimal, executable code snippet to reproduce the issue, please?