Confused with DataParallel and cuda()

I am confused when modifying the official mnist example like this:

First,add something :
model = Net()
model = nn.DataParallel(model,device_ids=[1]).cuda(gpu)

And give parameter gpu to every cuda() below.

But after running the script,I find two GPU working.

Having memory occupied doesn’t mean that it is used in DP. Judging from the size, I think it might just be the created cuda context.

Thank you!
If I change device_ids to [0,1],will the calculation process on this two GPU?

With batch size >= 2, yes.