Place different but correlated images in multiple channels in a CNN

Let’s say we are building a CNN and we have a multi-class classification problem. For each instance/example/object, it has 4 images captured by a sequence of 4 detectors. The resolutions of four detectors are different from each other (so they are correlated but the size of images is varied). The images are simple and grey (single channel NO RGB)

object : | image1 | image2 | image3 | image4 | label |

 [object]   -> |image 1 detector| -> |image 2 detector| ->  ... | image 4 detector|

We know the number of channels in the input layer is usually 3, standing for RGB. I have tried rescaling each type of image to the same size so that we can treat each image as a channel. We have four channels in the input layer. Aer there any better ideas to implement this model? any recent research papers related to this?

Thank you.