Training Variational Autoencoder (VAE) from multiple input tensors

I am new to the VAE implementation. And I want to predict SINR (RBG) image from multiple tensors such as Euclidean distance image, 3D distance image, Permittivity image, and Conductivity image (all in RBG).

Please let me know if there is any sample work done in such topic, or a headstart that anyone can give me. I could not find any relevant work online that could help me with the architecture.

