Flickr dataset input for Image Captioning

Flickr dataset gives 5 captions for every image.
When creating the dataset getitem (). Should I return image X with a list of lists of caption ids Y?

I’m not familiar with the Flickr dataset. Are these captions encoded with integer values or do you want to return the strings in a list?

Sir I was trying to deploy my image captioning project on web . So i have to use model weight. During creating image captioning project I set the epoch to 20! So total I got 20 model weights file.So if i want to use model weights in some other file so should i need to use all the model weights file or the only last model weight file i got during training?
Plss reply sir

Usually you would use the checkpoint with the lowest validation loss, which you should save during training.

how to save it? can you please provide the code? and how to check lowest validation loss?

Have a look at the ImageNet example to see how to save the checkpoint for the best epoch.