Which multi-GPU setups are worthwhile for training VLM's?

Hello everyone,

I have a project to train a VLM model. I would like to know if it is worth having two GPUs or one high-end GPU.

I saw some setups with two RTX 3060 8GBs and was curious if that would provide enough performance for training.
But looking at some GPUs like the 4080, the number of CUDA cores is almost double.

Another question I have is whether Pytorch allows multi-GPU training with Nvidia GPUs from different generations. For example, RTX 3060 + RTX 4070, both with differences in VRAM size.

Two 3060s can work but one 4080’s faster and less hassle. Mixed gpus work but don’t perform great together…