Tutorials to use resnet50 and transformer for image captioning

Hi can anybody offer me some useful and simple tutorial regarding using resnet50 and transformer for image captioning?