Thank you for your response.
This is better illustrated in the figure below – this is only an illustration for the R channel:
The depth values are quantized and then are used to convert the RGB images to RGB-D voxel representations. Currently, the depth tensor contains a single depth value per pixel. The values range from 0 to 400, so they can be quantized in 4 intervals. Do you have any idea or advice on how I can utilize the depth tensor to produce this kind of input representation? Thanks again.
