2D grid positional encoding

avalon1511 · October 11, 2023, 10:29pm

Hey everyone,

I have a 100 x 100 grid where certain (x,y) coordinates have a 32 dimension feature vector. I want to find the 2D positional encoding for this grid and add that to the feature vector at that position. So, for each x,y coordinate:

new feature = original feature + PE(x,y)

but I am confused about the implementation from: GitHub - tatp22/multidim-positional-encoding: An implementation of 1D, 2D, and 3D positional encoding in Pytorch and TensorFlow.

The input has to be of the form (batch size, x, y, ch) where ch = 32 and x, y are the coordinates. But, the calculated PE is also of size (batch size, x, y, ch). How do I extract the PE for certain positions in this case?

Thanks a lot

J_Johnson · October 11, 2023, 11:41pm

Currently traveling in China and can’t access Github. But I do recall seeing an implementation in PyTorch on Stable Diffusion with the UNet module by OpenAI.

You need to find a folder called ldm and one more called modules or submodules and the folder for OpenAI. There should be a positional encoder that covers 2d and possibly generalized for 1d and 3d.

avalon1511 · October 11, 2023, 11:41pm

Thanks a lot for this! Appreciate it