How to combine 2 images as input of training data

Hello everyone,

I am a developer and I am a newbie in machine learning. I have been researching for optical motion detections. I have been working for a Golf company and we have been using Polhemus devices for motion tracking. In our system, each swing (a short) is captured by 2 cameras from 2 sides (side and front of the player). So the output of each swing is 2 videos from 2 sides and a list of motion numbers for each video frame.

Basically, I can convert those videos to frames and can have motion numbers of each frame. So I intend to use existing data to train and create a model then using it to predict motion for new swings instead of using Polhemus devices.

I have a few questions and I hope someone here can help me:

  1. Where should I start with machine learning to address this function? Could you please suggest some algorithms or framework? I know that PyTorch is a great one but I am not sure it fits with my project or not.
  2. Is it possible to combine 2 sides of a frame as input of training data in PyTorch? I think that combining 2 sides of a frame will improve the accuracy.

Thank you so much!

I’m not sure, how your prediction exactly looks like.
Are you trying to predict new video frames or what do you mean by motion?
Could you post a dummy prediction?

I think PyTorch will meet your needs. Have a look at the tutorials to get familiar with the framework.

It’s possible to combine input frames and it depends on the use case.
You could combine them in the depth dimension, i.e. you would have an input of [batch_size, channels*2, h, w], although I’m sceptical it’s the most useful approach here.
Another way would be to combine them side by side, i.e. your input would be [batch_size, channels, 2*h, w].
Also, you could use the frames separately for some layers and then combine the activations later in the model.

Thank you so much for your reply. Yes I am trying to predict motion of new video frames. My input will be a video frame and the output will be shoulder turn, shoulder tilt, shoulder bend… of the player who appear in the video frame.