Neural network that predicts position (xy coordinates) of a point based on other objects in an image

Gabriele_Furnari · November 21, 2022, 10:38am

Hi everyone, i want to develop a neural network that predicts position (xy coordinates or just the location to place the point) of a point based on the position of other objects (whatever objects, for simplicity let’s use geometric shapes)in an image. The idea is that the model has to understand the spatial relation between the objects and the point. so if i give the model a different disposition of the objects it has to output the position of the point in that case.

To make it more clear, this is a first step to develop a model that extrapolate robot trajectories from videos and adapt them to a similar environment (with similar characteristic but not exactly the same). So this would be a test to understand if this approach could be good in this situation.

a basic idea could be to give a cnn a set of images with objects in different (but not so different) positions and of different dimensions as input and the same image with the point (whose position need to be predicted) in it as label. I’m not sure this could work

Thank you very much for your help!

JuanFMontesinos · November 21, 2022, 10:43am

It doesn’t work well
Obtaining coordinates is a whole field itself. It’s usually posed as an image2image translation problem where different scalar fields are generated, usually with gaussians around the cordinate of the points. Postprocessing allows to obtain pixelwise coords from this. It’s not my field but you can check yolo or openpose (and papers newer papers) that may show more powerful methods.

Gabriele_Furnari · November 22, 2022, 3:23pm

Thank you @JuanFMontesinos for your reply. I’m gonna try as you suggested.
Another option: assume i have coordinates of all the objects in my pic and the coordinates of the point, do you think it could be possibile to predict point’s coordinates given object coordinates (so a coor → coor network)?

Thank you very much!

JuanFMontesinos · November 22, 2022, 4:57pm

Well, the thing is you talk about a “point” but you don’t provide any description.

predicts position (xy coordinates or just the location to place the point) of a point based on the position of other objects (whatever objects, for simplicity let’s use geometric shapes)in an image

What’s the relation between those objects and that point. I think that’s more feasible as there exist a semantic link between both things. But it’d be interesting to know the logic in behind.

tiramisuNcustard · November 22, 2022, 6:33pm

I am not too familiar with the area, but have you explored the option of using GAN?

Gabriele_Furnari · November 23, 2022, 10:26am

I have a right triangle and a point (a little circle) whose coordinates are (x+2,y+1), where x and y are the coordinates of the vertex with the right angle. I have a number of images with different configuration of the triangle (different base and height, different position, etc…), but with same pattern, in them.

The idea is to find relation between objects and robot end-effector (point), so that, given a video of a task, the task is extrapolated and converted to a different but similar environment. This would be a very first and simpler step to learn a spatial relation between elements.

Gabriele_Furnari · November 23, 2022, 10:27am

Actually i haven’t. I’ll try. Thank you @tiramisuNcustard