Hello. I am trying to create a simple skeleton based posture recognition system for animals. i have fine tuned a yolo model. for every frame of a video, i detect a dog using yolo and then compute 2D skeletal keypoints using ViTpose. Then i compute the following features:
- bounding box aspect ratio
- all keypoint coordinates normalized with respect to the bounding box center.
- some joint angles (unsigned) (using torch.cosinesimilarity)
- some keypoint distances(signed, normalized wrt to bounding box).
to factor in occlusions and low conf predictions, i create a mask array of 1’s and 0’s. this is of the same shape as the features.
in my custom dataset class, i have:
def create_overlapping_sequences(self, features, masks, class_idx):
num_frames, num_features = features.shape[0], features.shape[1]
sequences = []
labels = []
mask_sequences = []
for frame_index in range(0, num_frames - (self.sequence_length - self.stride), self.stride):
sequences.append(features[frame_index : frame_index + self.sequence_length])
mask_sequences.append(masks[frame_index : frame_index + self.sequence_length])
labels.append(class_idx)
return sequences, mask_sequences, labels
to create overlapping sequences to sort of augment data. and then in the lstm classifier, i do:
def forward(self, x, mask=None):
"""
Forward pass
Args:
x: Input tensor of shape (batch_size, sequence_length, num_features)
Returns:
logits: Output tensor of shape (batch_size, num_classes)
"""
batch_size, seq_len, num_features = x.shape
if mask is not None:
x = x * mask
x = self.input_projection(x)
lstm_out, (hidden, cell) = self.lstm(x)
last_output = lstm_out[:, -1, :]
logits = self.classifier(last_output)
return logits
i am trying to handle occluded features while masking. for example, while calculating angle between 3 joints, if i get all kpt’s visibility as 1, i add the value in degrees to my features array and add a mask 1 in mask array. if even one kpt has visibility 0, i add mask 0 and feature -999 (a large value). same for all other features including normalized kpt coordinates (normalized with bbox center) and inter-joint distances.
no matter what i do, this classifier keeps putting out the same label for all postures. how do i go about solving this problem? have been at it for 2 days now. at my wits end