Possible Issue with batch norm train/eval modes

I am currently running into a similar issue, however I see my effects during .train() mode instead (POST).

It is definitely a batchnorm issue as I have forced removed running means and it seems to solve the issue. Nevertheless, there doesn’t seem to be any clear solution to training on different distributions simultaneously (I was pre-training my discriminator using “fake only” and “real only” batches and it failed to get anywhere due to batchnorm running estimates).