Pytorch Facenet MTCNN Image output

I am doing a face recognition app using facenet pytorch ( in python using two methods.

1st method Code -

resnet = InceptionResnetV1(pretrained='vggface2').eval()
mtcnn = MTCNN(image_size=96)

img =
image_prep = mtcnn(img)
plt.imshow(image_prep.permute(1, 2, 0))
if image_prep is not None:
  image_embedding = resnet(image_prep.unsqueeze(0))

In this code, I extract the face from the given images, and get the 512 encodings for recognizing faces.

In this example, I used two different faces, and plot the distance between the faces

        a.jpg       b.jpg
a.jpg   0.000000    1.142466
b.jpg   1.142466    0.000000

It works well…

2nd method Code-

 img =
 boxes, probs = mtcnn.detect(img) # Gives the coordinates of the face in the given image
 face = img.crop((boxes[0][0], boxes[0][1], boxes[0][2], boxes[0][3])) # Cropping the face
 pil_to_tensor = transforms.ToTensor()(face).unsqueeze_(0) # Converting to tensor type
 image_embedding = resnet(pil_to_tensor)

In this code, I used to get the coordinates of the face, first and then the embeddings. The distance between the two faces -

        a.jpg       b.jpg
a.jpg   0.000000    0.631094
b.jpg   0.631094    0.000000

In 1st method, I directly feed the image to the mtcnn and gets better result, the distance between the two faces are more than 1.0. In, 2nd method, I gets the coordinates of the faces using mtcnn.detect() , cropped the face from the given image, and feed to resnet. This method gives less distance between the two different faces.

Then, I find the reason for why the 1st method performs well than the 2nd method by plotting the result(face) before feeding to resnet.

In 2nd method, I feed the faces same as given in the input image(clear image) by cropping the faces using mtcnn.detect() .

But, In 1st method, I directly gives the input to mtcnn(img) which returns the tensor of the faces in dark than in input image. This darker image is not a clear image(area around eyes are darker, I tested with many photos), Can’t able to see the eyes clearly. This is the reason, the 1st method shows higher distance between two faces.

My doubt is, Why mtcnn returns the tensor(faces) in dark, How to solve it, Help me in this issue,


Could you show an example of the “dark” output and what you would expect it to look like?

@ptrblck Sir, For example, I used my face, Sir

this is the original image,

This is the image which I get It from mtcnn(image) -

I tried with many images, but the faces are returning dark, some faces are not very clear,

Flow -
Image --> MTCNN(Extracted the face part in tensor) --> RESNET(512 embeddings) --> Calculate the distance from two faces(two 512 embeddings)

Is there any other preprocessing needs to do, Sir


It looks like the second image is normalized, i.e. I guess some kind of histogram normalization might have been applied.
Would you like to apply the same processing steps in the second approach or what exactly is the issue?

@ptrblck Sir, I gave the 1st image to MTCNN and get the second image, Why the second image is darker, I thought, the MTCNN is simply gives the face part from given image but not modified face in dark. Thanks