Hi folks, I currently use a very primitive solution to match similar images. The goal is to find similar pages of scanned content and apply tags on it using previous known content. I compute the perceptual hashing of all my images using ImageHash, I then store that data in a structure and I compute the hammingDistance of the new image against all the stored hashes. It works, has a relative good precision but it can’t really scale.
I’m just starting with pytorch but I found tutorials on using resnet and extract features from the images. So I tried that:
class FeatureExtractor:
def __init__(self):
self.model = models.resnet50()
self.model = torch.nn.Sequential(*(list(self.model.children())[:-1]))
self.model.eval()
self.preprocess = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
def extract(self, image_path: str) -> np.ndarray:
# Load and preprocess the image
image = Image.open(image_path).convert('RGB') # Ensure image is in RGB format
image = self.preprocess(image).unsqueeze(0) # Add batch dimension
img = Image.open(image_path).convert("RGB")
img_t = self.preprocess(img).unsqueeze(0) # Add batch dimension
# Disable gradient computation for inference
with torch.no_grad():
features = self.model(img_t) # Get features from the modified ResNet
# Return the features as a list
return features.numpy().flatten().tolist()
So I got all my vectors and stored them in a vector database. The problem however is that I noticed that the cosine similarity is really close to whatever image I present. For example
considering:
image1 : source
image2: very similar
image3: random image with almost no visual identical features
similarity[1,2]: 0.9998799064636614
similarity[1,3]: 0.9987749858928928
Do you have any suggestions of other approaches I could use? Other models or other ways to extract features from those images that could help me find similar images to one presented?
Thank you