Dataset from dictionary of images


I feel like this should be easy but I’m actually at a loss of where to start (I’m recovering from C19 so I’ll blame that if I’m being especially slow).

I’ve trained my models and used the data sets from folders fine it all makes sense.

For inference, the images I’m using are huge microsopy images (30-100k x 4-8K pixels) and not pre-split (which had to be done to label them for training)

I can easily split them into a dictionary of PIL images (~30K images) that lives comfortably in memory.

Is there any way to use that as the input to a data loader - so I can use transforms and do the inference without saving the intermediate files?

Thanks in advance!

You can create a custom Dataset similar to this example, where you can perform transformation within __getitem__ and return the result without saving them.

1 Like

Thanks @nivek - that does make it a little clearer.

I think I can guess and hope I don’t blow anything up off the back of that. If I get it working I’ll post it back here so there is an actual example somewhere.

Realized I never did do this!

class DataFromDict(Dataset):
    def __init__(self,input_dict ):
        self.input_dict = input_dict
        self.input_keys = list(input_dict.keys())

    def __len__(self):
        return len(self.input_keys)

    def __getitem__(self,idx):
        item = self.input_dict[self.img_keys[idx]]['item_key']
        label = self.input_dict[self.img_keys[idx]]['label_key']
        return item, label

That’s not exactly how I did it since I didn’t need the label and so abused it by returning something slightly different - but that should work.

The list of keys shouldn’t be needed, I think, in python 3.7+ but means it is fixed. This should work as a starter for 10 when anyone comes looking for a similar problem.