Normalising images in Cityscapes

Hi, I am working on Cityscapes dataset and I looked for some mean and std for images in Cityscapes dataset in order to normalise them. But when I tried those values for normalising the images and I checked the mean and std of images after normalising it was not close to 0, and 1. So I skipped normalisation, since I did not want to use a pertained model but a Unet from scratch.

without normalisation, each epochs take roughly 145 seconds to be trained on images resized to 256x256 and batch_size =32 and learning rate = 10^{-4}. I am wondering the speed of training on this dataset with almost 2600 images for segmentation is normal or it is too high. I increased the num workers to 8 for one gpu and it was the maximum speed for training that I obtained. I also tried the DataPrallel with 3 GPUs and increased the num of workers to 12 (4 times as number of gpus) but I did not observe significant speed up.

I should add this point for a fixed learning rate the time for training each epoch is about 50 seconds that I think is reasonable but with a fixed learning rate I have the overfitting problem and the usual techniques for dealing with overfitting problem did not work. Since I do not want to make the model simpler I used a cyclic step learning where I set learning rate at beginning of each cycle as large as 10^-1 and then I reduce it to 10^{-4} during 30-50 epochs based on cousin annealing. I do not use the pytorch implementation but the implementation based on this paper that is compatible with model that I am working on it. When I use this learning rate schedule, the loss decreases very smoothly but the speed of training of each epoch increases from 50 seconds to 145 seconds. While I expect it gets faster since I increase the learning rate at starting of each cycle. I am wondering if this slow training is because of the lack of normalisation or it is totally natural with this dataset. I do not use any augmentation except resizing and converting to tensor. If normalisation is the issue, can anyone tell me how the images of this dataset are normalised?

Thank you in advance.