Different Embeddings depending on batch size

I was testing the cosine similarity of ResNet (not trained) and other models when I noticed that I would always get 1.0 for any pair of pictures.

I have some example code which demonstrates this:

img0 = Image.open("train/006_000.png")
img1 = Image.open("train/006_001.png")
tf_temp =transforms.Compose([transforms.Resize((256,256)),
                             transforms.Normalize(mean=[0.485, 0.456, 0.406],std=[0.229, 0.224, 0.225]),
img0 = tf_temp(img0).unsqueeze(0)
img1 = tf_temp(img1).unsqueeze(0)

batch = torch.cat([img0,img1],0)
embedding_temp = net.embedding(batch)
embedding_temp_1 = net.embedding(img0)
embedding_temp_2 = net.embedding(img1)
print(F.cosine_similarity(embedding_temp[0], embedding_temp[1], dim=0).item())
print(F.cosine_similarity(embedding_temp_1, embedding_temp_2, dim=1).item())

Note: net.embedding is the resnet.

I expected the results to be the same since I only changed how many tensors I processed at a time. The first forward pass batches both tensors, but the second and third treat img0 and img1 separately. If I use the separated variant, I always get near 1.0 values. Why?

Edit: When I set net.embedding.eval(), both numbers are the same. However, I still get values near 1.0 all time time. I am still unsure why this is still the case.


An untrained network can output very high similarity between different images. I believe that its because one of the activations dominates the other in magnitude - leading to a very high similarity. Take a pretrained network and input two different images and it should not be close to 1 :slight_smile: