Let’s say I have two GPUs with 4GB VRAM and I have a 6GB model.
Can I run this model by utilizing the two GPUs?
Let’s say I have two GPUs with 4GB VRAM and I have a 6GB model.
Can I run this model by utilizing the two GPUs?
You could use model sharding, i.e. executing different parts of the model of different devices, as given in this example.