I have two losses, namely frame prediction loss, and object classification loss. Frame prediction loss is a number between 0 and 1 while the object classification loss is a vector (say, in the case of moving MNIST dataset, it is a vector of size 10*1) that shows the loss between two one-hot vectors (the predicted one and the ground truth). I am not sure how to combine these two. What is some solution for mixing these vectors?

My input is video and at each frame, I am trying to predict the frame as well as objects in each frame jointly.

My frame prediction loss is realized using mean squared error between two consecutive frames in pixel level and the object classification loss is realized via cross-entropy loss between the ground truth one-hot vector and the predicted one-hot vector.