Manually managing which modules go where can be tedious and suboptimal, and data parallelization is memory inefficient. With nvlink providing high-bandwidth inter-GPU interconnect, is it possible to abstract away the individual gpu devices and provide a single unified compute pool?
1 Like