Why do we want many output channels in a Convolutional Neural Network?

Even for a single input channel(like a grayscale image), we try to have many output channels before feeding it into the fully connected layer.
I can’t understand why this is done?

The number of output channels defines the number of kernels, which convolve over your input volume.
So even if your input image is a grayscale image (1 channel), a lot of different kernels can learn different things.

Have a look at Stanford’s CS231n course.

Here is an example of 96 learned kernels for a color image by Krizhevsky et al.:

As you can see, the kernels learn a lot of different angles for edges etc.
Higher layers learn therefore more “complex” features based on the activations from previous layers.

I hope that clears thing up a bit.