Appearance-based hashing for similarity detection for picking the 100 most distinct images out 500 images

I would like to perform appearance-based hashing for similarity detection. I have 500 photos for each of my categories but I only want to maintain the 100 of them that are most distinct. How should I go about this? Are there already well-known baselines for this?

Also, what are the other known methods for this task? I was thinking I could also run the resnet50 on images, extract a 2048 dimension vector and do a cosine similarity but I wonder if there are deep neural network that can do the task I have in mind end-to-end or be more efficient than what I have in mind.