How to normalize uint16 depth image for training?

The depth image rendered from the ScanNet dataset is in uint16, after dividing the depth by the shift (1000), the depth values are in the range of [0, 1 < some positive float < 10], how to normalize the depth to [0, 1] (per dataset) for training?

So the question is whether you want per-dataset or per-image normalization - I would imagine that you want global because depth is a physical quantity (but in medical imaging it is not uncommon to use per-image, in particular when working with images taken by different scanners). If you need global, you’d need to iterate over the dataset once during preprocessing and find the maximum.

Best regards

Thomas

I need the per-dataset normalization. By saying iterate over the dataset and find the maximum, do you mean over the training set? or including the test set?

The rules are that you should use the training set.
The next question then is what to do when the test set a larger depth somewhere. I’d probably just clamp it down to 1. In a good training set, I would expect the training set to max out the depth reported by the camera somewhere.

Best regards

Thomas

Thanks, Tom. So then I can normalize the depth by using (depth - min_depth) / (max_depth - min_depth) where max_depth is the maximum depth value over the whole training set, right?

That’s what I’d do, though maybe you might just keep min_depth 0 or so.

I have a question regarding normalising depth images. How did you find the maximum depth value over the whole training set?