The impact of Beta value in adam optimizer

Hello all, I went through StyleGAN2 implementation. In adam optimizer, they used Beta_1=0. What’s the reason behind the choice? in terms of sample quality or convergence speed?

I guess a hyperparameter turning showed this setup worked fine starting apparently in the ProgGAN implementation.

Analyzing and Improving the Image Quality of StyleGAN:

We kept most of the details unchanged […] Adam optimizer [25] with the same hyperparameters (β1 = 0, β2 = 0.99, = 10−8, minibatch = 32) […]

A Style-Based Generator Architecture for Generative Adversarial Networks:

We build upon the official TensorFlow [1] implementation of Progressive GANs by Karras et al. […] In particular, we use the same discriminator architecture, resolution-dependent minibatch sizes, Adam [33] hyperparameters, […]


We train the networks using Adam (Kingma & Ba, 2015) with α = 0.001, β1 = 0, β2 = 0.99,
and = 10−8.

Thanks @ptrblck for your response, I read these papers but they didn’t mention any scientific method to choose these hyperparameters, should we decide by trial and error? I read in a blog post that choosing beta somewhere near to 1 (i.e 0.98) speeds up convergence then why do we have to choose Beta_1=0. And why not Beta_2?

It also coufused me quite a while

Adam with β1 = 0, β2 = 0.99 is equvalent to RMSprop with alpha=0.99. (Adam ≈ RMSprop with momentum)
what the original author (in [PROGRESSIVE GROWING OF GANS FOR IMPROVED QUALITY, STABILITY, AND VARIATION] , thanks @ptrblck ) really need is RMSprop, and he achieved his goal by the Adam optimizer.