Can we concatenate two GPUs to run single big model?

You could use model sharding, i.e. executing different parts of the model of different devices, as given in this example.

1 Like