Hardcoded numbers in image transforms?


I am getting started with pytorch by browsing code and in a lot of example code i see magic numbers used away without explanation or comments. What do these numbers mean. I come from a C/C++ background where it is considered a horrible practice to ever use numbers in the code, preferring #defines on top to explain the values. But these numbers do not seem specific for the project or an author idiosyncrasy and seems more like a pytorch norm.
For example, In the pytorch tutorial page the parameters to the Normalise function,

        transforms.Normalize(mean=[0.485, 0.456, 0.406],
                             std=[0.229, 0.224, 0.225])

Other examples are permute and view functions in one of the authors on github.

    im = cv2.imread(dirpath + '/' + file)
    im = torch.Tensor(im).permute(2, 0, 1).view(1, 3, 224, 224).double()
    im -= torch.Tensor(np.array([129.1863, 104.7624, 93.5940])).double().view(1, 3, 1, 1)

Thanks for the help!

These numbers are taken from the ImageNet dataset (info about pretrained models).
As these stats were used to pretraind the models, they should also work for finetuning.
However, if your dataset samples come from another domain (e.g. medical images), you would most likely want to use the stats from your training dataset.

1 Like

Great thanks for the quick response. Useful link. That helps me understand Normalise and the first example fully.

Regarding the 2nd example, Any convention regarding transforms like permute or view?
When data is given by the dataloader does it apply any of these transforms automatically? And because this examples is reading the image from memory without using a data loading abstraction these transforms are done by the author? I do not have any other popular repositories at the moment. I am looking at this example currently. Repo Link But I can get back with more examples where a lot of image manipulation seems to be done by operations that seems obvious for the pytorch community but not for an outsider.

Permute: OpenCV uses BGR rather than RGB, so to import the data correctly, it has to permute the channels.

View is just the right way of viewing that imported data correctly.

1 Like