Cuda() and DataParallel

Do we need to call cuda() for model and data if we use DataParallel?

Say we have four GPUs, specifically there are three questions:

a. If we do not call cuda(), the model and data is on CPU, will it be any time inefficiency when it is replicated to 4 GPUs?

b. If we call cuda(), the model and data is on GPU #1, will it be any space inefficiency in terms of replicate it again on GPU #1, or it won’t be replicate again if the model/data was there?

c. Overall, for time/space efficiency, should we call cuda() if we use DataParallel?