Grad of grad fails on multiple gpus

grad of grads seems to fail on multiple gpus with the following error:

RuntimeError: arguments are located on different GPUs at /pytorch/torch/lib/THC/generated/…/generic/THCTensorMathPointwise.cu

Small snippet:

        interp_points = Variable(some_tensor, requires_grad=True)
        errD_interp_vec = netD(interp_points)
        errD_gradient, = torch.autograd.grad(errD_interp_vec.sum(), interp_points, create_graph=True)
        lip_est = (errD_gradient).view(batch_size, -1).sum(1)
        lip_loss = penalty_weight*((1.0-lip_est)**2).mean(0).view(1)
        lip_loss.backward()

If the backward is computed directly to netD(interp_points) everything is fain. netD is wrapped in Data parallel table.
Does anyone have any idea?
Thanks!

if you give a small script that reproduces this error, i will investigate further.