Normalizing 16-bit Medical Images

Hi there,

I just started using PyTorch and want to build a patch classifier for breast mammography. Thing is, my image patches are in range from [0, 65535] and I just found out that ToTensor() operation is treating my images as they are 8-bit. Here is the code I am currently using to load my dataset:

data_transforms = {

    'train': transforms.Compose([




    'val':  transforms.Compose([



    'test':  transforms.Compose([



image_datasets = {x: datasets.ImageFolder(os.path.join(raw_images_root_dir, x), data_transforms[x])
                  for x in ['train', 'val', 'test']}
dataloaders = {x :[x], batch_size=32, shuffle=True, num_workers=multiprocessing.cpu_count())
                for x in ['train', 'val', 'test']}
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

When I try to train my model I am gettig REALLY BAD RESULTS. Basically 50% accuracy throughout entire training. Here is my training setup:

criterion = nn.CrossEntropyLoss()

learning_rate = 0.01

optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=2, gamma=0.1)

model = train_model(model, criterion, optimizer, scheduler, dataloaders, dataset_sizes, device, num_epochs=11)

Here is my training info:

[INFO]: Epoch 0/10
[INFO]: Epoch [0/10], Step [0/134], Loss: 0.6922
[INFO]: Epoch [0/10], Step [20/134], Loss: 0.7699
[INFO]: Epoch [0/10], Step [40/134], Loss: 0.6767
[INFO]: Epoch [0/10], Step [60/134], Loss: 0.7273
[INFO]: Epoch [0/10], Step [80/134], Loss: 0.7482
[INFO]: Epoch [0/10], Step [100/134], Loss: 0.8035
[INFO]: Epoch [0/10], Step [120/134], Loss: 0.7248

[INFO]: train accuracy: 0.5035
[INFO]: train loss: 0.8703

[INFO]: Epoch [0/10], Step [0/34], Loss: 30.9444
[INFO]: Epoch [0/10], Step [20/34], Loss: 38.3757

[INFO]: val accuracy: 0.5000
[INFO]: val loss: 28.2381

[INFO]: Epoch 1/10
[INFO]: Epoch [1/10], Step [0/134], Loss: 0.6472
[INFO]: Epoch [1/10], Step [20/134], Loss: 0.6883
[INFO]: Epoch [1/10], Step [40/134], Loss: 0.6616
[INFO]: Epoch [1/10], Step [60/134], Loss: 0.6833
[INFO]: Epoch [1/10], Step [80/134], Loss: 0.6401
[INFO]: Epoch [1/10], Step [100/134], Loss: 0.6897
[INFO]: Epoch [1/10], Step [120/134], Loss: 0.6746

[INFO]: train accuracy: 0.5091
[INFO]: train loss: 0.7010

[INFO]: Epoch [1/10], Step [0/34], Loss: 0.9954
[INFO]: Epoch [1/10], Step [20/34], Loss: 0.7296

[INFO]: val accuracy: 0.5000
[INFO]: val loss: 0.7585

[INFO]: Epoch 2/10
[INFO]: Epoch [2/10], Step [0/134], Loss: 0.7247
[INFO]: Epoch [2/10], Step [20/134], Loss: 0.7023
[INFO]: Epoch [2/10], Step [40/134], Loss: 0.6870
[INFO]: Epoch [2/10], Step [60/134], Loss: 0.6869
[INFO]: Epoch [2/10], Step [80/134], Loss: 0.6935
[INFO]: Epoch [2/10], Step [100/134], Loss: 0.7037
[INFO]: Epoch [2/10], Step [120/134], Loss: 0.6893

[INFO]: train accuracy: 0.5119
[INFO]: train loss: 0.6927

[INFO]: Epoch [2/10], Step [0/34], Loss: 0.6981
[INFO]: Epoch [2/10], Step [20/34], Loss: 0.7100

[INFO]: val accuracy: 0.5000
[INFO]: val loss: 0.6977

[INFO]: Epoch 3/10
[INFO]: Epoch [3/10], Step [0/134], Loss: 0.7029
[INFO]: Epoch [3/10], Step [20/134], Loss: 0.6941
[INFO]: Epoch [3/10], Step [40/134], Loss: 0.7016
[INFO]: Epoch [3/10], Step [60/134], Loss: 0.6886
[INFO]: Epoch [3/10], Step [80/134], Loss: 0.7015
[INFO]: Epoch [3/10], Step [100/134], Loss: 0.6877
[INFO]: Epoch [3/10], Step [120/134], Loss: 0.6901

[INFO]: train accuracy: 0.5021
[INFO]: train loss: 0.6919

[INFO]: Epoch [3/10], Step [0/34], Loss: 0.6654
[INFO]: Epoch [3/10], Step [20/34], Loss: 0.7031

[INFO]: val accuracy: 0.5000
[INFO]: val loss: 0.6976

[INFO]: Epoch 4/10
[INFO]: Epoch [4/10], Step [0/134], Loss: 0.6859

I guess this is due to the fact that my images are NOT loaded correctly. When I inspect one of the images with following code:

dataiter = iter(dataloaders['train'])

images, labels =

I get following output:


How do I load my images correctly, taking into the fact that pixels are between [0, 65535] (16-bit).

Thank you!

in what format are your files?
Usually medical images are in some specific format (e.g. DICOM) and you need to apply a window for transform it into gray scale images

Yes, I am starting with DICOM images but when I read them with pydicom library they are being converted to N-dim NumPy arrays which have 16-bit pixels.

Yes, pydicom returns a int16 array-type.

If you want to view it as a greyscale image in rgb format, then you need to know what window level you’re using or need, and adjust appropriately before saving or displaying.

First you have to extract parameters slope and interception from the DICOM file and convert to hounsfield units hu = pixel_value * slope + interception and then apply a correct window.

This resources can be util for you:

and this code can be useful, specially see the step 2:

1 Like

Thank you so much for your help!

Hi @ivan-jgr , thanks for sharing. I read your proposed solution but I cant get all the information that I need from de Dicom file.

This is the 0x0028 group, from here I can take Rows (ds.Rows) and Columns (ds.Columns)

(0028, 0002) Samples per Pixel                   US: 1
(0028, 0004) Photometric Interpretation          CS: 'MONOCHROME2'
(0028, 0008) Number of Frames                    IS: "1"
(0028, 0010) Rows                                US: 512
(0028, 0011) Columns                             US: 512
(0028, 0100) Bits Allocated                      US: 16
(0028, 0101) Bits Stored                         US: 12
(0028, 0102) High Bit                            US: 11
(0028, 0103) Pixel Representation                US: 0
(0028, 0301) Burned In Annotation                CS: 'NO'
(0028, 2110) Lossy Image Compression             CS: '00'
(0028, 9001) Data Point Rows                     UL: 1
(0028, 9002) Data Point Columns                  UL: 0
(0028, 9003) Signal Domain Columns               CS: ''
(0028, 9108) Data Representation                 CS: ''

But calling ds.WindowWidth or ds.WindowCenter throws an error. I think It’s because that tag is in another group as shown below:

(0028, 9132)  Frame VOI LUT Sequence  1 item(s) ---- 
      (0028, 1050) Window Center                       DS: "1070.0"
      (0028, 1051) Window Width                        DS: "1860.0"
      (0028, 1055) Window Center & Width Explanation   LO: 'User selection'
      (0028, 1056) VOI LUT Function                    CS: 'LINEAR'

When I open the image using a Dicom Viewer the tag is shown, but I don’t know how to call a (group, element) that is inside another (group, element) using Pydicom.

Thanks for the help!

Hello Ivan!

Here are you suggesting to save the windowed images. Do we need to export as 8 bits?
I want to train my model with 16 bits images. Is this possible?

Hi! I’m planning do the same you did due to the same reason (getting bad results). Does Pytorch change my 16 bits images to 8 bits during transforms?
I think when training DL models windowing is not necessary, we cant see so many shades of gray but machines do!

Hi @Pcamellon,

Yes, you can train your model with 16 bits tensors, remember that in an image you can only have [0, 255] value (i.e. 8 bits).

The usual approach is apply the window and save the image for feed the network with that images (you can Resize, and use the torch.transforms with that images) but if you want to use the original hounsfield values, you can write your own loader for the dicom files.

Also remember that the transformation ToTensor() normalize the images between [0, 1]. Torch will normalize [0, 65536] → [0, 1] or [0, 255] → [0, 1].

Hope this helps you.

Thanks Ivan for your answer :grinning_face_with_smiling_eyes: !

My images are 16 bits png, so the values go from [0, 65536]. Before ToTensor() transformation does PyTorch change these to 8 bits?

So the transformation just divides each element from de image matrix representation by 65536 (16 bits in my case) and I get a representation of the same interval [0, 65536] but in float data type [0, 1]. Am I right?

Exactly! ToTensor() doesn’t change your 16 bits to 8 bits, it will only scale between 0 and 1.

1 Like

My images are 16 bits png, so the values go from [0, 65536]. Before ToTensor() transformation does PyTorch change these to 8 bits?

Exactly. The last Comment here is really the key. Write your own loader. This is a great Resource to get started

Thanks. I’ll check it out. I already finished the project. I decided go with 8 bits :frowning: Too much complications with 16 and it didn’t worth it.

opencv is able to deal with uint16. However, transforms.Compose ONLY deals with PIL? Rather than np.ndarray ?

Take a look at here: : transforms.PILToTensor()


And, it looks there is a transforms.ToPILImage(), which should be able to transform np.ndarry to PIL. But, unfortunately, it does NOT deal with uint16…

So many incompatibilities…