VAE and VAEGAN producing the same image. Mode Collapse?

I have tried to implement the standard VAE and the CVAEGAN model (without the class labels) on the frames of the Atari game Breakout. But for some reason, the model always outputs an image which looks like the average of the frames in the training set. I found that this is possibly the mode collapse problem but I am unsure as to how to rectify this.

Any ideas?