Double Siamese Network?

Hi guys…im working in a “different” kind of network, and i’m pretty new in pytorch.
My idea is to have 2 different image pairs, each one processed by one network. As output i want the depth map of each image pair, and then the mean betwen the two resulting maps…but i have no idea about how to do that. Im starting with DrivingStereo Dataset, did some code but im struggling in the basics, like loading the data

here is my code so far…
Colab Notebook

I got this diagram from internet. Ignoring the dimensions, it represents what i have in mind.

And this the data folder structure.
Data Structure

When i load the dataset, it came as one “array” with 11104 images…thats why i use the “interval=2776” (i know it’s not how it’s done), because the dataset have left camera, right camera, disparity map and depth map.

the dataset can be found here.