Use memory of all gpus

Yes, either model or data parallelism. See also Model parallelism in Multi-GPUs: forward/backward graph