How to use multi-GPU with v1.1.0

How to use multi-GPU to train our model with the newest stable version. Just cast the model to nn.DataParallel? Or plus manually converting BN to nn.SyncBatchNorm?

        model = nn.DataParallel(model)
        model = nn.SyncBatchNorm.convert_sync_batchnorm(model) (Is this necessary?)

Moreover, do we need to set the batch-size to nGPUs times the single GPU batch-size?

Another thing weird is, when I used model = nn.DataParallel(model) without change the batch-size, it automatically distributed to 4 GPUs, with total memory-in-use about 2 times of memory in the single-GPU situation. And the training result is also a little different.
One thing I found more efficient is
model = nn.SyncBatchNorm.convert_sync_batchnorm(model)
is not necessary when I have
model = nn.DataParallel(model)
at least in pytorch1.1.0.

So this issue is changed to how to handle batch-size with nn.DataParallel, and why the result using nn.DataParallel is different.