Image captioning

I was reading an image captioning paper and It got me thinking:

  1. Do we need to do image detection, image classification and then captioning
  2. We don’t need to do them separately?
    Thank you for the insight in advance