Improved WGAN Scatter is not differentiable twice

Certainly! The discriminator takes two inputs and combines concatenates their channel dimension. It is a sequence of Convolutions with a Linear layer at the end.
There has been a similar question here:

I’m also running my models in DataParallel and can explicitly post the code if need be.