Preprocessing Dataloader Batches prior to sending them through to model

Hi there, (first time posting)

I am currently working on a university project where I’m aiming to use two different models on a dataset and combine the results during training. My question is regarding the preprocessing of my datasets. Currently my dataset is a variety of images.

For the first model, each image is converted to a numpy array that represents the information for each colour channel.

For the second model, facial elements of the image are extracted before being passed on to the model.

As you can see, each model requires different preprocessing before the data can be passed to each model. My current approach is to load the entire image dataset into a Dataloader and then while iterating through each batch, perform the preprocessing on the batches before passing them on to the models. I wanted to see if this is the correct approach.

Please let me know what you think. Thanks

1 Like

For anyone who finds this and is interested:
My previous understanding was that getitem returned the data + the class. However, after playing around I realise we can return a list of data. My current approach will be to send a list, from getitem where the first element in the array will be the data related to my first model, and the second item in the array will be the corresponding data related to the second model.