CIFAR10 add perturbation slow cudnn_convolution

I generate a set of noises and add them on CIFAR-10. But, after add the noises, I found that the entire training time has increased by three times and simultaneously consuming more space.

I dont know what happen and use profile to see what module cost more time. Then, I found that cudnn_convolution consumed more time, but I don’t understand why this extra time was generated.

Here is my code for dataset which add noises.

class SSLPerturbationDataset(CIFAR10):
    def __init__(self, root: str, train: bool = True, transform=None, target_transform=None, download: bool = False, noises=[]) -> None:
        super().__init__(root, train, transform, target_transform, download)
        self.data = self.data.astype(np.float32)
        self.labels = self.targets
        self.noises = noises
        if len(self.noises.shape) == 4:
            self.noises = self.noises.mul(255).clamp_(
                0, 255).permute(0, 2, 3, 1).to('cpu').numpy()
        else:
            self.noises = self.noises.mul(255).clamp_(
                0, 255).permute(0, 1, 3, 4, 2).to('cpu').numpy()
        with torch.no_grad():
            for idx in tqdm(range(len(self)), desc='add noise'):
                self.data[idx] = self.data[idx] + self.noises[idx]
                self.data[idx] = np.clip(self.data[idx], a_min=0, a_max=255)
        self.data = self.data.astype(np.uint8)

    def __getitem__(self, index: int):
        img, target = self.data[index], self.targets[index]
        img = Image.fromarray(img)

        if self.transform is not None:
            img1 = self.transform(img)
            img2 = self.transform(img)

        if self.target_transform is not None:
            target = self.target_transform(target)

        return img1, img2

I doubt any GPU operation will be affected by an offline CPU transformation so could you post a minimal and executable code snippet showing the slowdown, please?