Working with very huge .tif files

vishnu_vardhan1 · July 10, 2022, 5:07am

Hi,

I have a dataset which comprises over a thousand high-resolution whole-slide digital pathology images, and my goal is to create a classifier.

The problem that I’m facing is, I’m unable to train an image classifer due to high memory usage issues [tried to allocate more memory than is available. Session has restarted.]
and each .tif file has a dimension of (60797, 34007, 3), and I want to scale them down without losing critical information.

Can anyone help me on how to work with these huge .tif files. Thanks

Shivang_Patel · July 10, 2022, 1:35pm

The tensor of dimension (60797, 34007, 3) is a serious memory problem. If you can’t scale it down, then all I can suggest is to crop it into meaningful small patches.

vishnu_vardhan1 · July 10, 2022, 4:03pm

Thanks,

but, I am just a beginner, and below is my code. Can you please give me an example code of how to convert it into small patches.

import numpy as np 
import pandas as pd 

from tifffile import imread

class ImageData():
    def __init__(self, df, data_dir, transform):
        super().__init__()
        self.df = df
        self.data_dir = data_dir
        self.transform = transform

        
    def __len__(self):
        return len(self.df)
    
    def __getitem__(self, index):       
        img_name = str(self.df.image_id[index]) + '.tif'
        label = self.df.label[index]          
        img_path = os.path.join(self.data_dir,img_name)   
         
        image= imread(img_path)

        
        image = self.transform(image)
        return image, torch.tensor(label)

data_transf = transforms.Compose([transforms.ToPILImage('RGB'),
                                  transforms.Resize((224,224)),
                                  transforms.RandomRotation(137.5),
                                  transforms.ToTensor(),
                                  transforms.Normalize(mean = [0.610, 0.377, 0.233],
                                                       std = [0.377, 0.233, 0.144])])

train_data = ImageData(df = df, data_dir = image_path, transform = data_transf)
trainloader = DataLoader(dataset = train_data, batch_size=4)

And, I forgot to mention that my memory isuue, [tried to allocate more memory than is available. Session has restarted. ] occurs when the training loop starts.

Shivang_Patel · July 10, 2022, 5:46pm

So from the code that you share, it seems like you are reducing the image from (60797, 34007, 3) to (224, 224, 3) then you are applying random rotations and many other transformation. Now the question is where are you getting the memory error? On CPU or on GPU.

Because the Dataloader is returning the size ( 3, 224,224) tensor. It should not cause memory problem in gpu unless you have big batch size.

vishnu_vardhan1 · July 10, 2022, 7:31pm

True, its not causing in GPU, but when my program executes this line of code in the training loop:

for images,labels in trainloader:

my CPU RAM reaches its limit and I get the above mentioned error, My CPU has a RAM of 13GB

Shivang_Patel · July 10, 2022, 7:49pm

Excellent! So this means when you’re loading this big images they takes all the space on your RAM. Solution would be to perform preprocessing ahead of training so you don’t get into these issues.

vishnu_vardhan1 · July 10, 2022, 7:51pm

So, do you mean by manually converting (60797, 34007, 3) to (224,224,3) solves the problem!

Let me give it a try then

ptrblck · July 10, 2022, 9:26pm

Note that each image of [60797, 34007, 3] will take ~23GB in float32 and ~5.8GB in uint8, which might already fill up your host RAM.

vishnu_vardhan1 · July 10, 2022, 10:59pm

Yes, I’m unable to load a single image. Any ideas on how to deal with these kind of problems. Each image is atleast 1.2GB in size