Difference between CNN and Autoencoder

Hi, can someone explain me, the difference between a CNN and an autoencoder?

CNN is used to process an image i.e. you take an image and pass it through a CNN. For example, you pass 3x224x224 size images to the CNN (3 = num_channels (RGB), 224x224 = height x width of image). At the end of a CNN you get something like 512x7x7 size feature map (which is also like RGB image except it has 512 channels instead of 3). Rest everything is same, so you are going from 3 channels to 512 channels and also along the way you are reducing your image size from 224x224 to 7x7.

Why we are doing that? The main idea is that the 512 channels produced at the end would carry important information/features about the input image which would allow us to do image recognition, object detection and many more things.

Autoencoders. Your aim to learn the input data distribution. If we have to extend the CNN case from above. After getting the 512x7x7 feature map, we can now repeat the process in inverse i.e. go from 512x7x7 -> 2x224x224. And you have got your autoencoder. So autoencoder in short does something as 3x224x224 -> 512x7x7 -> 3x224x224.

Note: It is an oversimplified example.

Why we are doing this? Because using this method allows the autoencoder to learn important information about the input distribution, thus allowing you to reconstruct the input image. (This is different from GAN, an autoencoder cannot generate new samples it can only reconstruct the information it is trained on).

1 Like

Thank you for the detailed explaination. I just wanted to know more about this.

1 Like