I’m trying to create a sequence to sequence lstm that will output a series of human poses in the form of 3D coordinates. The input data I’m training in is kept within a 1x1x1 bounding box, so all the coordinate values are <1. Using traditional MSE for example will give me very small loss values even when the error is pretty high since all the values are <1. I also realized that since the poses I have are from video frames (24 FPS), the frame to frame movement is very small, so my model ends up predicting a single pose many times. I figured the best way to ensure the model outputs a series of distinct poses would be to calculate loss on the diff of the values (e.g. np.diff(values)). However these values would be even smaller, so I tried to solve this with custom loss functions that calculate percentage based error.
However these attempts didn’t really solve my issue. When I tried to do percentage based error on just the coordinates themselves, I had noisy output that somewhat moved around but the poses weren’t anatomically correct (the point representing the center of the hip would be way off for example). Trying to do so on just the diff of the values didn’t give sensible output either (the coordinates were all clustered as a blob). I tried doing MAPE and relative percent difference on the diffs of the values and that ended up with my model returning NaN. Could anyone give me some insight on potential errors with my approach/better approaches?
For context, here are the error functions I created to try and do percentage error:
def MAPELoss(output, target): return torch.mean(torch.abs((target - output) / target)) def RPDLoss(output, target): return torch.mean(torch.abs(target - output) / ((torch.abs(target) + torch.abs(output)) / 2)) def MAPELoss_Diff(output, target): output = output[1:] - output[:-1] target = target[1:] - target[:-1] return torch.mean(torch.abs((target - output) / target)) def RPDLoss_Diff(output, target): output = output[1:] - output[:-1] target = target[1:] - target[:-1] return torch.mean(torch.abs(target - output) / ((torch.abs(target) + torch.abs(output)) / 2))