Nn.dataparallel overhead

I am running a model from https://github.com/NVIDIA/vid2vid.

There seems to be a siginificant overhead on the first gpu in comparsion to the other.

The following is the gpu memory usage(I am training on 4 12GB GeForce GTX TITAN X)
memory.used [MiB], memory.free [MiB]
11385 MiB, 822 MiB
2976 MiB, 9231 MiB
2976 MiB, 9231 MiB
2976 MiB, 9231 MiB

I am wonder whether it can be caused by nn.dataparallel?