Grad becomes nan for all parameters

Thank you for your quick response!

Yes, I have normalized my input data and set self.norm_pix_loss = True. Here is my data augmentation strategy, the mean and std are computed on my dataset.

    transform_train = transforms.Compose([
            transforms.RandomResizedCrop(args.input_size, scale=(0.2, 1.0), interpolation=3),  # 3 is bicubic
            transforms.RandomHorizontalFlip(),
            transforms.ToTensor(),
            transforms.Normalize(mean=[0.4271, 0.4054, 0.4118], std=[0.2124, 0.2165, 0.2112])])