Beginner tutorial on similar image classification

Hi folks, I currently use a very primitive solution to match similar images. The goal is to find similar pages of scanned content and apply tags on it using previous known content. I compute the perceptual hashing of all my images using ImageHash, I then store that data in a structure and I compute the hammingDistance of the new image against all the stored hashes. It works, has a relative good precision but it can’t really scale.

I’m just starting with pytorch but I found tutorials on using resnet and extract features from the images. So I tried that:

class FeatureExtractor:
    def __init__(self):
        self.model = models.resnet50()
        self.model = torch.nn.Sequential(*(list(self.model.children())[:-1]))
        self.model.eval()
        self.preprocess = transforms.Compose([
            transforms.Resize(256),
            transforms.CenterCrop(224),
            transforms.ToTensor(),
            transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
        ])

    def extract(self, image_path: str) -> np.ndarray:
        # Load and preprocess the image
        image = Image.open(image_path).convert('RGB')  # Ensure image is in RGB format
        image = self.preprocess(image).unsqueeze(0)  # Add batch dimension

        img = Image.open(image_path).convert("RGB")
        img_t = self.preprocess(img).unsqueeze(0)  # Add batch dimension

        # Disable gradient computation for inference
        with torch.no_grad():
            features = self.model(img_t)  # Get features from the modified ResNet

        # Return the features as a list
        return features.numpy().flatten().tolist()

So I got all my vectors and stored them in a vector database. The problem however is that I noticed that the cosine similarity is really close to whatever image I present. For example
considering:
image1 : source
image2: very similar
image3: random image with almost no visual identical features

similarity[1,2]: 0.9998799064636614
similarity[1,3]: 0.9987749858928928

Do you have any suggestions of other approaches I could use? Other models or other ways to extract features from those images that could help me find similar images to one presented?

Thank you