I have `n`

points each with an associated color

```
n = 100
colors = torch.rand((n, 3))
points = torch.randint(0, 10, (n, 2))
x_min, x_max = torch.min(points[:, 0]).data, torch.max(points[:, 0]).data
y_min, y_max = torch.min(points[:, 1]).data, torch.max(points[:, 1]).data
```

I want to discretize the point’s colors onto some `target`

grid

```
target = torch.zeros((5, 5, 3))
eps = torch.finfo(torch.float32).eps # Bucketizing from the "left"
x_steps = torch.linspace(x_min, x_max + eps, target.shape[0] + 1)
y_steps = torch.linspace(y_min, y_max + eps, target.shape[1] + 1)
```

I can determine which points fall within each grid location with `bucketize`

```
x_bucket_indices = torch.bucketize(points[:, 0], x_steps)
y_bucket_indices = torch.bucketize(points[:, 1], y_steps)
```

Though the issue comes when attempting to update `target`

based on the `(0, 1, n)`

number of colors that may be present in each of its cells/buckets/elements.

```
for ix in range(target.shape[0]):
row_inds = torch.where(x_bucket_indices == ix)[0]
for iy in range(target.shape[1]):
col_inds = torch.where(y_bucket_indices == iy)[0]
cell_inds = intersection(row_inds, col_inds)
# Collect the color of the points that fall within this cell
target[ix, iy, :] = pool_color(colors[cell_inds, :])
```

(Utilizing these two helper functions)

```
# Define how to "pool" the colors in each cell of the grid(/bucket)
def pool_color(colors):
if len(colors) == 0: # Interpolate? (Later)
return torch.zeros((3,))
if len(colors) == 1: # Single points single color
return colors[0]
if len(colors) > 1: # Average of all colors in cell
return torch.mean(colors, dim=0)
# Define a helper function to find the intersection of unique vectors
def intersection(x, y):
# x[1, 2, 3, 4], y[3, 4] -> [3, 4]
combined = torch.cat((x, y))
unq, cnt = combined.unique(return_counts=True)
return unq[cnt > 1]
```

**Is there a way to further vectorize the operations within the for loops such that they are performed on the GPU?**

I think one of the biggest issues is not having access to something akin to `map`

in PyTorch and the generation of a potentially “very” ragged/jagged tensor depending how the `pool_color`

is applied.

Though the problem itself seems very parallelizable. Provided a set of indices, pull from shared memory the colors, perform `pool_color`

, and write the result to shared memory in target. I’m just not sure how to implement this any further in torch.