I was reading an image captioning paper and It got me thinking:
- Do we need to do image detection, image classification and then captioning
- We don’t need to do them separately?
Thank you for the insight in advance
I was reading an image captioning paper and It got me thinking: