I observed that leakyrelus and relus are used as activation function in the model. In case of discriminator model Sigmoid is used as the last activation function which can be explained as we need probabilities of image belonging to real or fake dataset. But why do we use Tanh activation in final layer of generator when we know that its range is [-1,1]?
It seems to work better.
From Alec’s, Luke’s and Soumith’s DCGAN paper:
The ReLU activation (Nair & Hinton, 2010) is used in the generator with the exception of the output layer which uses the Tanh function. We observed that using a bounded activation allowed the model to learn more quickly to saturate and cover the color space of the training distribution. Within the discriminator we found the leaky rectified activation (Maas et al., 2013) (Xu et al., 2015) to work well, especially for higher resolution modeling. This is in contrast to the original GAN paper, which used the maxout activation (Goodfellow et al., 2013).