CUDA out of memory after more than 8000 iterators

when I train a model,
the input code is like the follows:
head = torch.from_numpy(head).to(self.device)
the loss code is like:
loss_sum += loss.item()

but I got a “cuda out of memory” after 8784 iterators
why?

Could you post your training code so that we can have a look at potential bugs?