2D grid positional encoding

Hey everyone,

I have a 100 x 100 grid where certain (x,y) coordinates have a 32 dimension feature vector. I want to find the 2D positional encoding for this grid and add that to the feature vector at that position. So, for each x,y coordinate:

new feature = original feature + PE(x,y)

but I am confused about the implementation from: GitHub - tatp22/multidim-positional-encoding: An implementation of 1D, 2D, and 3D positional encoding in Pytorch and TensorFlow.

The input has to be of the form (batch size, x, y, ch) where ch = 32 and x, y are the coordinates. But, the calculated PE is also of size (batch size, x, y, ch). How do I extract the PE for certain positions in this case?

Thanks a lot

Currently traveling in China and can’t access Github. But I do recall seeing an implementation in PyTorch on Stable Diffusion with the UNet module by OpenAI.

You need to find a folder called ldm and one more called modules or submodules and the folder for OpenAI. There should be a positional encoder that covers 2d and possibly generalized for 1d and 3d.

Thanks a lot for this! Appreciate it :slight_smile: