Hello all, I went through StyleGAN2 implementation. In adam optimizer, they used Beta_1=0. What’s the reason behind the choice? in terms of sample quality or convergence speed?
I guess a hyperparameter turning showed this setup worked fine starting apparently in the ProgGAN implementation.
We kept most of the details unchanged […] Adam optimizer  with the same hyperparameters (β1 = 0, β2 = 0.99, = 10−8, minibatch = 32) […]
We build upon the official TensorFlow  implementation of Progressive GANs by Karras et al. […] In particular, we use the same discriminator architecture, resolution-dependent minibatch sizes, Adam  hyperparameters, […]
We train the networks using Adam (Kingma & Ba, 2015) with α = 0.001, β1 = 0, β2 = 0.99,
and = 10−8.
Thanks @ptrblck for your response, I read these papers but they didn’t mention any scientific method to choose these hyperparameters, should we decide by trial and error? I read in a blog post that choosing beta somewhere near to 1 (i.e 0.98) speeds up convergence then why do we have to choose Beta_1=0. And why not Beta_2?