How to batch a stereo image dataset?

Hello guys. I’ve watched many youtube tutorials, read many posts, but can’t figure how to load my data properly. I’m working with Driving Stereo Dataset. This is the structure of the dataset folder.
estrutura

When i load it with ImageFolder, it come as an “array” with 11104 positions.

train_path = './train'

transformations = transforms.Compose([
                                transforms.Resize((400,800)),
                                transforms.ToTensor(),
                                transforms.Grayscale(1)])

train_dataset = torchvision.datasets.ImageFolder(root=train_path,transform=transformations)

When i print some info about the data, I get these results

print(len(train_dataset))
print(len(train_dataset[0]))
print(len(train_dataset[0][0]))
print(train_dataset[0][0].type())
print(train_dataset[0][0].shape)

11104
2
1
torch.FloatTensor
torch.Size([1, 400, 800])

And the related images are in that positions…

left_image = train_dataset[0][0].clone().detach()
plt.figure(figsize=(5,10))
plt.imshow(left_image[0],cmap='gray')


right_image = train_dataset[2776][0].clone().detach()
plt.figure(figsize=(5,10))
plt.imshow(right_image[0],cmap='gray')

disparity_map = train_dataset[5552][0].clone().detach()
plt.figure(figsize=(5,10))
plt.imshow(disparity_map[0],cmap='gray')

depth_map = train_dataset[8328][0].clone().detach()
plt.figure(figsize=(5,10))
plt.imshow(depth_map[0],cmap='gray')

how can i batch this data, using disparity map or depth map as the “label”, to use this as the expected output of a network that receives left and right images as input?

here is the link of my notebook

https://colab.research.google.com/drive/107f8365tHXOZDPp6z6O7QxZF-oVCRShn?usp=sharing

thank you so much!