Can we concatenate two GPUs to run single big model?

Let’s say I have two GPUs with 4GB VRAM and I have a 6GB model.

Can I run this model by utilizing the two GPUs?

You could use model sharding, i.e. executing different parts of the model of different devices, as given in this example.

1 Like