Classification of image temporally

Hi, I have a problem where in, I want to classify images into class A, B, C based on the objects in the image. I also want to introduce a temporal vector which stores the time elapsed from the start of the sequence. I want to use this as well to predict the class A, B, and C which are dependent on time as well.
I understand how they can be trained as separate tasks, but I am curious to know can they be combined as a single task and trained?