Problem with batch size and torch.nn.DataParallel

I was using pytorch for my project and observed a strange phenomenon. In my DataLoader, when I was putting batch size as 16 or 32, the system was hanging. It was particularly working for the batch size 10. Even a less batch size was getting stuck. For testing purpose, I removed the torch.nn.DataParallel wrapper on my model and now it is working for all batch sizes, but it is using a single GPU. For my purpose, I need multiple GPU. So, what problem is happening with data parallel? Is there any solution available? Please reply fast because there is some urgency.