how can i know my image pixel bit width ?
It is same as the number channels of the image * 8.
As @Kushaj said, True color RGB images will use a bit depth of 24 (8 for each channel).
However, your images can of course come from another domain, which might use another bit depth value.
E.g. depth images often use a single channel with 16 bits, while Dicom images might use 12 bits.
If you can load the image via PIL
, you could check the image.depth
or image.mode
attributes.
But my maximum and minimum value of my image pixel are 151.47 and -119.675,basically if we keep 1bit for signbit and again we require more than 8bit for integer bit itself
there is a bitoverflow here
If you max and min values are 151.47 and -119.675, respectively, your image would be encoded in a floating point format, not a 8bit integer.
I don’t fully understand the use case. What did image.depth
or image.mode
return?
Sorry @ptrblck for very late reply,I am getting mode as RGB ,and for depth I am getting AttributeError: ‘JpegImageFile’ object has no attribute 'depth’
It’s a bit strange, that Jpeg images in RGB return floating point values, as I would assume they are encoded using uint8
. Could you post the results of:
img = # load image
a = np.array(img)
print(np.max(a), np.min(a), a.dtype)
img = Image.open("/content/XNOR-Net-PyTorch/ImageNet/networks/data/val/n01440764/ILSVRC2012_val_00002138.JPEG")
a = np.array(img)
print(np.max(a),np.min(a),a.dtype) -> (255, 0, dtype('uint8'))
Thanks for the update.
How do these min and max values fit to your previously reported values of 151.47
and -119.675
?
my image undergoes transforms,so I guess from that i am getting those values
val_loader = torch.utils.data.DataLoader(
datasets.ImageFolder(valdir, transforms.Compose([
transforms.Resize((256, 256)),
transforms.CenterCrop(input_size),
transforms.ToTensor(),
normalize,
])),
batch_size=args.batch_size, shuffle=False,
num_workers=args.workers, pin_memory=True)```
ToTensor
normalizes unit8
images to the range [0, 1]
, so the normalize
transformation would need to use a really small std
to blow these values up again.
Anyway, as you can see from your previous post, the bit width is 8bit for your images: dtype('uint8')
.
- as ’ uint8 suggest all pixel values must be positive right? but after undergoing transforms my input has a negative value
transforms.Normalize
subtracts the mean
and divides by the std
so negative and positive values are expected. However, they are usually smaller in magnitude than your reported values, so feel free to post a full code snippet to reproduce these values.
- here is the code where i am getting input
def validate(val_loader, model, criterion):
batch_time = AverageMeter()
losses = AverageMeter()
top1 = AverageMeter()
top5 = AverageMeter()
# switch to evaluate mode
model.eval()
end = time.time()
bin_op.binarization()
for i, (input, target) in enumerate(val_loader):
target = target.cuda(async=True)
with torch.no_grad():
input_var = torch.autograd.Variable(input)
target_var = torch.autograd.Variable(target)
print(input_var)
# compute output
output = model(input_var)
loss = criterion(output, target_var)```
* this is how my input sample look like,I am getting **floating point** as input altough it is in encoded in **uint8**
* So i round it off to nearest integer as to get 8 bit value
print(input_var)
```27.325001 13.325003 -56.674995 -62.674999 -66.675003 -70.675003
-64.675003 -57.674995 -18.675001 18.325003 -2.675003 3.324996
12.325003 16.325003 25.325001 -14.675002 1.324996 -12.675002
-27.674999 -38.674999 -23.675001 -23.675001 -37.674999 -16.675001
57.324997 -46.674999 -72.675003 -74.675003 -78.675003 -102.674995
-98.675003 -81.675003 19.325003 104.324989 101.324989 125.324989
125.324989 120.324989 125.324989 128.324982 127.324989 128.324982
125.324989 121.324989 123.324989 126.324989 126.324989 108.324989
110.324989 128.324982 128.324982 129.324982 129.324982 129.324982
129.324982 127.324989 129.324982 127.324989 122.324989 125.324989
129.324982 129.324982 127.324989 127.324989 129.324982 128.324982
129.324982 129.324982 130.324982 130.324982 128.324982 127.324989
128.324982 126.324989 128.324982 129.324982 128.324982 128.324982
127.324989 128.324982 128.324982 130.324982 129.324982 130.324982
130.324982 131.324982 131.324982 129.324982 129.324982 129.324982
129.324982 129.324982 129.324982 129.324982 127.324989 128.324982
128.324982 117.324989 14.325003 15.325003 21.325001 51.324997
-5.675003 -97.675003 -96.675003 -94.675003 -93.675003 -81.675003
60.324997 121.324989 119.324989 119.324989 121.324989 125.324989
125.324989 120.324989 119.324989 127.324989 128.324982 127.324989
118.324989 2.324996 -84.675003 -90.675003 -95.675003 -96.675003
-97.675003 -68.675003 33.325001 -0.675003 59.324997 103.324989```
What if it’s NOT uint8, BUT uint16 ?
Will ToTensor()
normalize uint16 images
to [0,1]
also?
No, I don’t think so as seen here:
a = np.random.randint(0, 65535, (224, 224, 3), dtype=np.uint16)
img = PIL.Image.fromarray(a, 'I;16')
transform = transforms.ToTensor()
out = transform(img)
print(out.min(), out.max())
# tensor(-32767, dtype=torch.int16) tensor(32767, dtype=torch.int16)
As you can see, it’ll also transform the data to int16
here as uint16
is an unsupported tensor type.
CC @fmassa @vfdev-5 @pmeier
I guess this is intentional due to the lack of uint16
support in torch.from_numpy
?
I would guess uint16
wasn’t added as a dtype
because the demand for it might not be that huge since you could use e.g. int32
with a memory overhead.
Since these integer types are also used for inputs mainly (the model won’t be able to train with these dtype
s directly) the memory overhead might also be negligible.
Adding a new dtype
would need to be piped through all methods increasing the binary size significantly, which is another argument against adding new types unless their use cases justify it.
Makes sense.
Just for your information: probably a niche domain, but 16bit grayscale images are commonly used in microscopy imaging.
Yes, I agree that 16bit image formats are used, but do you think a native dtype
would be still needed to be added via uint16
to PyTorch?
I would assume the common approach would be to load these images (e.g. via PIL
, OpenCV
, or another image library which supports this format), and transform it to a floating point format tensor (float32
by default) which is then used to train the model.
Which operations would you like to apply directly on the uint16
image in PyTorch?