I have a dataset which comprises over a thousand high-resolution whole-slide digital pathology images, and my goal is to create a classifier.
The problem that I’m facing is, I’m unable to train an image classifer due to high memory usage issues [tried to allocate more memory than is available. Session has restarted.]
and each .tif file has a dimension of (60797, 34007, 3), and I want to scale them down without losing critical information.
Can anyone help me on how to work with these huge .tif files. Thanks
The tensor of dimension (60797, 34007, 3) is a serious memory problem. If you can’t scale it down, then all I can suggest is to crop it into meaningful small patches.
And, I forgot to mention that my memory isuue, [tried to allocate more memory than is available. Session has restarted. ] occurs when the training loop starts.
So from the code that you share, it seems like you are reducing the image from (60797, 34007, 3) to (224, 224, 3) then you are applying random rotations and many other transformation. Now the question is where are you getting the memory error? On CPU or on GPU.
Because the Dataloader is returning the size ( 3, 224,224) tensor. It should not cause memory problem in gpu unless you have big batch size.
Excellent! So this means when you’re loading this big images they takes all the space on your RAM. Solution would be to perform preprocessing ahead of training so you don’t get into these issues.