Is it possible to further vectorize the operations of this colored point torch.bucketization?

I have n points each with an associated color

n = 100
colors = torch.rand((n, 3))
points = torch.randint(0, 10, (n, 2))
x_min, x_max = torch.min(points[:, 0]).data, torch.max(points[:, 0]).data
y_min, y_max = torch.min(points[:, 1]).data, torch.max(points[:, 1]).data

I want to discretize the point’s colors onto some target grid

target = torch.zeros((5, 5, 3))
eps = torch.finfo(torch.float32).eps # Bucketizing from the "left"
x_steps = torch.linspace(x_min, x_max + eps, target.shape[0] + 1)
y_steps = torch.linspace(y_min, y_max + eps, target.shape[1] + 1)

I can determine which points fall within each grid location with bucketize

x_bucket_indices = torch.bucketize(points[:, 0], x_steps)
y_bucket_indices = torch.bucketize(points[:, 1], y_steps)

Though the issue comes when attempting to update target based on the (0, 1, n) number of colors that may be present in each of its cells/buckets/elements.

for ix in range(target.shape[0]):
    row_inds = torch.where(x_bucket_indices == ix)[0]
    for iy in range(target.shape[1]):
        col_inds = torch.where(y_bucket_indices == iy)[0]
        cell_inds = intersection(row_inds, col_inds)

        # Collect the color of the points that fall within this cell
        target[ix, iy, :] = pool_color(colors[cell_inds, :])

(Utilizing these two helper functions)

# Define how to "pool" the colors in each cell of the grid(/bucket)
def pool_color(colors):
    if len(colors) == 0: # Interpolate? (Later)
        return torch.zeros((3,))
    if len(colors) == 1: # Single points single color
        return colors[0]
    if len(colors) > 1:  # Average of all colors in cell
        return torch.mean(colors, dim=0)

# Define a helper function to find the intersection of unique vectors
def intersection(x, y):
    # x[1, 2, 3, 4], y[3, 4] -> [3, 4]
    combined =, y))
    unq, cnt = combined.unique(return_counts=True)
    return unq[cnt > 1]

Is there a way to further vectorize the operations within the for loops such that they are performed on the GPU?

I think one of the biggest issues is not having access to something akin to map in PyTorch and the generation of a potentially “very” ragged/jagged tensor depending how the pool_color is applied.

Though the problem itself seems very parallelizable. Provided a set of indices, pull from shared memory the colors, perform pool_color, and write the result to shared memory in target. I’m just not sure how to implement this any further in torch.

If you compute a global bucket index (by combining x and y) you can then use the third-party PyTorch Scatter library.

Best regards


Thanks Thomas. I think that may work for my case. I’ve played around with a bit this morning. Just struggling a bit with generalizing it to the 2D case.

I think one of the issues may be, but I am not yet certain, that the src tensor in the scatter operation may only be allowed to be (n,) in shape without any more axes. For example, how would the operation know on what dimension to average over if a tensor of (n, q, r, s) was provided?

I’ll keep looking into it, but just an fyi for others who may have suggestions or input I have not yet solved the problem.