Input range for TorchVision with Batteries Included?

In the TorchVision with Batteries Included effort, I didn’t notice the blog post or the torchvision ConvNext documentation indicating the expected input pixel value range before or after transform.

Are the pixels expected to be in the 0-1 range? 0-255 range? Something else?

The transforms for ConvNext all reuse ImageClassification as seen here which accepts a PIL.Image and will scale it to [0, 1] first and then normalize it as described here so I assume the input can be a pure uint8 PIL.Image in [0, 255] assuming you are using the predefined transformation.

1 Like

Fantastic, thank you. I had seen your first link, but didn’t mentally parse the partial call over to ImageClassification. Much appreciated!

Let me know if this works of of you are seeing unexpected results as I’ve just checked the source code without a verification.

My use-case is applying the ConvNext encoder for segmentation. I have been providing input in the [-1, 1] range and it works quite well, but now that you pointed me to the normalization constants I’ll try the usual scale to [0, 1] and then normalize approach and let you know if I get better results.

It will take quite awhile before I can comment about the final metrics, but the training and validation loss are now decreasing faster than they were before. So it seems that identifying the right normalization range has been useful. Thanks again.