Yes, you could use these values but note that the normalization would be disabled so you might also completely remove the Normalize transformation.
You might also see a slower convergence as your input data is not standardized anymore.
I want to process infrared images which I convert to grayscale. And I am not sure how to normalize them properly. I have no access to the temperature information which is stored in the images.
Is there a paper which describes why and how normalization could be done for different types of images?
In the common approach (e.g. training ImageNet) you would normalize the images first to the value range of [0, 1] (this is done via transforms.ToTensor()) and afterwards normalize/standardize these tensors such that they have a zero-mean and unit-variance via Noemalize(mean=..., std=...). These stats were calculated from the training dataset previously and you could do the same. I.e. iterate the training data once, calculate the mean and std of all training samples and store these stats.
If that’s not possible for some reasons, you might just normalize to the [0, 1] range and check if this would already allow the model to converge.
The images are normalized to [0,1]. I can derive the mean and std from the training dataset, but what if the conditions when taking the images for inference are different. This would lead to a different mean and std. If the conditions would be the same for all images, then this would work. But when conditions change, I have worries.
During training loss indicates that it converges. Loss goes down very fast. But AP does not look good:
Test: [ 0/500] eta: 0:13:52 model_time: 0.5020 (0.5020) evaluator_time: 0.0313 (0.0313) time: 1.6658 data: 1.1165 max mem: 6069
Test: [100/500] eta: 0:01:52 model_time: 0.2344 (0.2321) evaluator_time: 0.0156 (0.0232) time: 0.2664 data: 0.0102 max mem: 6069
Test: [200/500] eta: 0:01:22 model_time: 0.2344 (0.2309) evaluator_time: 0.0156 (0.0233) time: 0.2680 data: 0.0039 max mem: 6069
Test: [300/500] eta: 0:00:54 model_time: 0.2344 (0.2305) evaluator_time: 0.0156 (0.0230) time: 0.2666 data: 0.0078 max mem: 6069
Test: [400/500] eta: 0:00:27 model_time: 0.2344 (0.2306) evaluator_time: 0.0157 (0.0236) time: 0.2680 data: 0.0039 max mem: 6069
Test: [499/500] eta: 0:00:00 model_time: 0.2344 (0.2302) evaluator_time: 0.0156 (0.0236) time: 0.2672 data: 0.0055 max mem: 6069
Test: Total time: 0:02:15 (0.2714 s / it)
Averaged stats: model_time: 0.2344 (0.2302) evaluator_time: 0.0156 (0.0236)
Accumulating evaluation results...
DONE (t=0.28s).
Accumulating evaluation results...
DONE (t=0.38s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.001
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.015
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.015
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.015
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.015
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.009
IoU metric: segm
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.001
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.011
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.014
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.014
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.014
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.014
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.009
The detection of the bbox and mask is near to perfect when I test on some images, but the classification is very bad. Each of the two classes has a score at around 0.5.
But I have no idea how to improve this.
This is a general concern and not specific to the normalization.
Even if you are not normalizing the input data the model would still “learn” the training data distribution. If the validation or test data distribution changes, a worse performance would be expected.