100% GPU Util at the beginning of training, but become extremely unstable later

Hi,
I transformed the ImageNet dataset into LMDB database. During training, I find that the GPU Util is around 99% stably at the beginning of training, but it jumps between 99% and 0% after hundreds of iterations, as a result, the speed of training slows down significantly.
In addition, I have run the code several times before, but such phenomenon has never happened. The code has not been changed. Dataloader is shown below.

class train_dataset(Dataset):
    def __init__(self):
        super(train_dataset,self).__init__()
        self.root = r'/data/imagenet/train/'
        self.env = lmdb.open(self.root)
        self.txn = self.env.begin(write = False)
        self.transforms = transforms.Compose([
            transforms.RandomResizedCrop(224),
            transforms.RandomHorizontalFlip(),
            transforms.ToTensor(),
            normalize
            ])
        #        print(self.__len__())
    def __getitem__(self,i):
        image_bin =  self.txn.get((str(i)+'_img').encode())
# another way to open binary data, but got the same phenomenon
#        image = Image.open(BytesIO(image_bin))
#        if image.mode != 'RGB':
#            image = image.convert('RGB')
        image_buf = np.frombuffer(image_bin,dtype = np.uint8)
        image = cv2.imdecode(image_buf,cv2.IMREAD_COLOR)
        image = Image.fromarray(cv2.cvtColor(image,cv2.COLOR_BGR2RGB))
        image = self.transforms(image)
        label =  self.txn.get((str(i)+'_label').encode()).decode()
        label = int(label)
        return {'image':image,'label':label}
    def __len__(self):
        return self.txn.stat()['entries']//2

loader_train = torch.utils.data.DataLoader(
	train_dataset(), batch_size=bs_train, shuffle=True,
	num_workers=n_worker, pin_memory=True)

What’s the possible causes?
Sincerely

It the decrease in GPU utilization observable after an approx. fixed number of iterations or does your code slow down consistently?
Are you storing some variables somewhere, e.g. losses.append(loss)?
Do you see an increased usage in GPU or CPU memory?

Thanks.
It become unstable after an approx. fixed number of iterations.
Yes, I used losses.append(loss), but memory usage is constant. How does storage of variables affect GPU Util?

I forgot to mention that I store losses in CPU memory.