Custom Dataset using Tifffile python library

DVangelis · April 26, 2022, 9:37am

Hi to all,
My first message here and brand new to pytorch and AI.
I have trained a model and now I want to load unseen images to my model so I can segmentate them.
To do that I self defined a dataset class ‘Mydataset’ that gets the directory of images and read the files by using the library tifffile and make some transformations to them.
I work with 3d stacks of images that have tif format.
My problem is that ‘Mydataset’ read only the first slice of the 3d stack but not all the slices from the stuck. Normally I would like with the MyUnseen dataset to read all the slices in my stacks normally I have stacks of 250 slices so in the end my dataset need to have a length of 250. I really appreciate if anyone can help me with this issue thanks in advance.

class MyUnseen(Dataset):

def __init__(self,raw_directory):

    self.raw=raw_directory
    self.data_list=glob(raw_direct)
    self.data_len=len(self.data_list)
    self.images=tifffile.imread(self.raw)

def __getitem__(self,idx):
    images=self.images[idx]

    images=normalize_custom(images) # Apply custom function normalize 

    images_tensor=torchvision.transforms.ToTensor()(images)

    padding=int((512-images_tensor.shape[2])/2)
    images_ten=transforms.Pad(padding)(images_tensor)

    return (images_ten)
def __len__(self):
    return self.data_len

ptrblck · April 27, 2022, 3:50am

I’m not sure how to interpret your custom Dataset as it seems as if only a single image was read in:

self.images=tifffile.imread(self.raw)

or is this method able to read multiple images?

DVangelis · April 27, 2022, 8:57am

Yes tifffile.imread is able to read also 3d stacks of tiffiles . And it returns it in numpy array format.
So out it seems that work out is the below I dont know if it is an easier solution or more logical. I append the images after transformations in a list inside a loop and outside the loop I use torch.stack.
(Actually this solution was found in another post in this forum )

def __getitem__(self,idx):
    images=self.images[idx]
    list_of_images=[]
    for i,file in tqdm(enumerate(self.images)):
        images=normalize_custom(file) # Apply custom function normalize 
        images_np=np.expand_dims(images,0)
        
        images_tensor=torch.from_numpy(images_np).float()
        padding=int((512-images_tensor.shape[1])/2)
        images_ten=transforms.Pad(padding)(images_tensor)
        list_of_images.append(images_ten)
    images_stack=torch.stack(list_of_images,axis=0)

ptrblck · April 28, 2022, 5:29am

Thanks for clarifying.
In your current code it seems you are loading and processing all images in each __getitem__ call, which would mean that setting batch_size=1 in the DataLoader would give you the entire dataset. Is this what you really want?

DVangelis · April 28, 2022, 4:10pm

My model was trained using DataParallel then I saved the model.module.state_dict() .
In inference part I load the saved model by using ,
DEVICE=torch.device(“cuda:0” if torch.cuda.is_available() else “cpu”)
model_trained=MyModel()
checkpoint_fpath=****

model_trained = model_trained.cuda(0)

model_trained = nn.DataParallel(model_trained)

checkpoint=torch.load(checkpoint_fpath,map_location=torch.device(DEVICE))
model_trained.module.load_state_dict(checkpoint[‘state_dict’])
model_trained.eval()

For making my predictions I load my 3d stack of tif files and I make the necessary transformations from numpy to torch.tensor and I name my images unseen_tensor
Then I use
prediction_tensors=model_trained(unseen_tensor.to(DEVICE))

But I get an RuntimeError Caught RuntimeError in replica 0 ond device 0
RuntimeError: CUDA out of memory…

So I thought If I use a dataset class and dataloader and then use a loop to make my predictions I could overcome this error.
I hope now makes more sense what I am trying to do
Thanks again for your help

ptrblck · April 28, 2022, 11:27pm

Split this line:

prediction_tensors=model_trained(unseen_tensor.to(DEVICE))

into a loop and iterate the data:

model_trained.eval()
preds = []
with torch.no_grad():
    for input in unseen_tensor: # you can use another logic here to create mini batches etc.
        pred = model(input)
        preds.append(pred.detach().cpu())
preds = torch.stack(preds)

DVangelis · April 29, 2022, 9:39am

Thanks that helped a lot. From 10 hours for segmentating my dataset with this method took only 30 minutes. So the predictions are running under cpu right ? I mean with the pred.detach().cpu() ?

ptrblck · April 29, 2022, 6:44pm

No, the output prediction is detached and moved to the host to avoid storing these tensors on the GPU.

DVangelis · April 30, 2022, 9:12am

Ok thank you! I will need to have a closer look to the documentation of detach and host.