Keypoints detection program

Sylvain_Ard · December 20, 2023, 1:13pm

I mean one point per object for example the left elbow of a human, there is only one point of this class for an object “human” but for several points per class we can take the example of the teeth of a plant leaf, there are from 0 to any, and we don’t know the exact number, but they’re all of the class “teeth”

maog77 · December 20, 2023, 1:17pm

Ok for example for the teeth of a plant leaf, for each leaf you passed as an object did you have to indicate all the teeth in the train dataset or just one tooth for each object?

Sylvain_Ard · December 20, 2023, 1:18pm

all the teeth in the training dataset so the coco format and mmpose doesn’t works

maog77 · December 20, 2023, 1:21pm

ok and you only indicated one class for keypoint detection

Sylvain_Ard · December 20, 2023, 1:23pm

I don’t launch the training because I don’t find any program which do this but there could be several classes for example leaves teeth and leaves lobes

maog77 · December 20, 2023, 1:24pm

Ok I’ll try to do some tests

Sylvain_Ard · December 20, 2023, 1:27pm

I asked to ChatGPT a code for this and it tells me that, I don’ tested :

import pandas as pd
import numpy as np
import cv2
import tensorflow as tf

# charger les données d'apprentissage à partir du fichier Excel
df = pd.read_excel('points.xlsx')

# extraire les noms d'images uniques
image_names = df['nom_image'].unique()

# prétraiter les images et les annotations
images = []
annotations = []
for name in image_names:
    # charger l'image
    image = cv2.imread(name)
    # redimensionner l'image pour la taille du modèle
    image = cv2.resize(image, (224, 224))
    # ajouter l'image à la liste des images
    images.append(image)
    # extraire les annotations correspondant à cette image
    image_df = df[df['nom_image'] == name]
    # créer la heatmap des annotations
    heatmap = np.zeros((224, 224, 4), dtype=np.float32)
    for i, row in image_df.iterrows():
        x, y, class_id = row['x'], row['y'], row['classe']
        # ajouter un point à la heatmap correspondant à la classe
        heatmap[int(y * 224), int(x * 224), class_id] = 1
    # ajouter la heatmap à la liste des annotations
    annotations.append(heatmap)

# convertir les données en tableaux numpy
images = np.array(images)
annotations = np.array(annotations)

# définir le modèle
model = tf.keras.models.Sequential([
    tf.keras.applications.MobileNetV2(include_top=False, input_shape=(224, 224, 3)),
    tf.keras.layers.Conv2D(4, kernel_size=(1, 1), activation='sigmoid')
])

# compiler le modèle
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# entraîner le modèle
model.fit(images, annotations, epochs=10, batch_size=16)

# sauvegarder le modèle
model.save('model.h5')

# charger le modèle entraîné
model = tf.keras.models.load_model('model.h5')

# charger l'image de test
image = cv2.imread('test_image.jpg')

# redimensionner l'image pour la taille du modèle
image = cv2.resize(image, (224, 224))

# faire une prédiction sur l'image
heatmap = model.predict(np.array([image]))

# extraire les points détectés à partir de la heatmap
points = []
for i in range(4):
    heatmap_channel = heatmap[0, :, :, i]
    _, max_val, _, max_loc = cv2.minMaxLoc(heatmap_channel)
    if max_val > 0.5:
        points.append((max_loc[0] / 224, max_loc[1] / 224))

# afficher les points détectés
print("Points détectés : ", points)


avec padding :
import cv2
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras.applications import VGG16
from tensorflow.keras.layers import Conv2DTranspose, add, Activation, Cropping2D
from tensorflow.keras.models import Model

# Charger les données d'apprentissage à partir d'un fichier Excel
train_data = pd.read_excel('train_data.xlsx')

# Charger les images d'apprentissage et les points correspondants
X_train = []
Y_train = []
for _, row in train_data.iterrows():
    image = cv2.imread(row['nom image'])
	# Calculer le padding
	h, w, _ = image.shape
	ratio = min(672.0 / h, 672.0 / w)
	nw, nh = int(w * ratio), int(h * ratio)
	left = int((672 - nw) / 2)
	right = 672 - nw - left
	top = int((672 - nh) / 2)
	bottom = 672 - nh - top
	img_resized = cv2.resize(image, (nw, nh))
    image = cv2.copyMakeBorder(img_resized, top, bottom, left, right, cv2.BORDER_CONSTANT)
    X_train.append(image)
    Y_train.append([(row['x'] * ratio + left),
                 (row['y'] * ratio + top),
                 int(row['classe du point'])])
X_train = np.array(X_train) / 255.0
Y_train = np.array(Y_train)

# Charger le modèle VGG16 pré-entraîné
vgg16 = VGG16(include_top=False, input_shape=(672, 672, 3))

# Congeler les couches du modèle VGG16
for layer in vgg16.layers:
    layer.trainable = False

# Ajouter des couches personnalisées au modèle
x = Flatten()(vgg16.output)
x = Dense(1024, activation='relu')(x)
x = Dropout(0.5)(x)
x = Dense(512, activation='relu')(x)
x = Dropout(0.5)(x)
x = Dense(4, activation='softmax')(x)

# Créer le modèle final
model = Model(inputs=vgg16.input, outputs=x)

# Compiler le modèle
model.compile(optimizer='adam', loss='categorical_crossentropy')

# Entraîner le modèle
model.fit(X_train, tf.keras.utils.to_categorical(Y_train),
          epochs=10, batch_size=32)

# Sauvegarder le modèle entraîné
model.save('model.h5')

#test
import cv2
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras.applications import VGG16
from tensorflow.keras.layers import Flatten, Dense, Dropout
from tensorflow.keras.models import Model

# Charger le modèle entraîné
model = tf.keras.models.load_model('model.h5')

# Charger l'image de test
image_path = 'test_image.jpg'
img = cv2.imread(image_path)

h, w, _ = img.shape
# Calculer le padding
ratio = min(672.0 / h, 672.0 / w)
nw, nh = int(w * ratio), int(h * ratio)
left = int((672 - nw) / 2)
right = 672 - nw - left
top = int((672 - nh) / 2)
bottom = 672 - nh - top
img_resized = cv2.resize(img, (nw, nh))
img_padded = cv2.copyMakeBorder(img_resized, top, bottom, left, right, cv2.BORDER_CONSTANT)

# Faire la prédiction
pred = model.predict(np.array([img_padded]))

# Obtenir la carte de chaleur (heatmap)
heatmap = pred[0][:, :, 1]

# Normaliser la carte de chaleur entre 0 et 1
heatmap = cv2.normalize(heatmap, None, 0, 1, cv2.NORM_MINMAX)

# Cropper la carte de chaleur pour enlever le padding
heatmap = heatmap[top:top+nh, left:left+nw]

# Redimensionner la carte de chaleur à la taille de l'image redimensionnée
heatmap_resized = cv2.resize(heatmap, (nw, nh))

# Récupérer les points prédits
threshold = 0.5
points_pred = []
for i in range(heatmap_resized.shape[0]):
    for j in range(heatmap_resized.shape[1]):
        if heatmap_resized[i, j] > threshold:
            points_pred.append([j, i])

# Afficher l'image avec les points prédits
for point in points_pred:
    x = int(point[0] / ratio) + int(left / ratio)
    y = int(point[1] / ratio) + int(top / ratio)
    cv2.circle(img, (x, y), 5, (0, 0, 255), -1)

cv2.imshow('output', img)
cv2.waitKey(0)
cv2.destroyAllWindows()

type or paste code here

ErikDerGute · April 11, 2024, 10:52am

Hey, I found this topic while having some issues too. I also want to train the build in rccn model for keypoint and object detection on a custom dataset. However, sooner or later I always run into the same error: “IndexError: index 168 is out of bounds for dimension 0 with size 168”. I found this article: How to Train a Custom Keypoint Detection Model with PyTorch | Medium , they have done something similar, but when adapt the code for my application it didn’t work. Full code and dataset example could be found here: IndexError: index 168 is out of bounds for dimension 0 with size 168 in keypointrcnn_loss · Issue #8371 · pytorch/vision · GitHub . Maybe someone knows a solution for my problem, or at least have an idea how to adapt the code for my application correctly.

Sylvain_Ard · April 11, 2024, 2:07pm

I can advise you “mmpose” which works, I have already used it
my question is specific to yolov9 : is it doing keypoints detection or if not, is it planned ?

ErikDerGute · April 11, 2024, 8:18pm

Hey thanks for your advise. Fortunately I could fix the error by myself. A incorrect shape of my keypoints in target{} were causing my problems. I already tried mmpose, only to realize that Keypoint and object detection for more than one class is not part of the Framework? However, trying to train a rsn net on a custom dataset resulting for me in bad errors, which I couldn’t fix myself. Do you have any good guidance for custom training using mmpose? At the end you maybe want to check out the ultralytics framework, which is very easy to use. For example, 3 lines of code trains the yolov8 model on custom dataset. Iam assuming that they will also add the yolov9 in the future. Perhaps we should exchange ideas in more detail?

Sylvain_Ard · April 11, 2024, 9:31pm

OK write to me at sylvain.ard@gmail.com and we could exchange by Skype, to parametrize mmpose is not as simple as an answer by one only mail