Optimizing tensor operations by replacing for loops

I’m trying to fuse and discretize (in terms of image size) segmentation masks. The inputs are m masks that are 1024 x 1024 (<m x 1024 x 1024>) and the output should be <1 x 64 x 64>.

Each mask corresponds to a segmented object, with a corresponding id (up to 91 using MsCoco labels) and these labels are stored in a tensor of size m. To “fuse” the masks, each mask is multiplied by its id and then summed together (using a for loop). I’ve tried doing this matrix multiplication, but have not been able to make it work properly.

To reduce the size of the fused mask, an appropriately sized window is taken and the most common element in a window is kept. I’m unsure how to vectorize this operation either. Any help would be appreciated! This is the current version of the code:

def createFeatureMaps(masks, num_x_windows = 64, num_y_windows = 64):
    feature_maps = []
    for current_mask in masks:
        # load stored tensors.
        recovered_masks = deserializeTensor(current_mask["masks"])
        recovered_classes = deserializeTensor(current_mask["classes"])
        # Convert boolean tensor to binary
        raw_masks = recovered_masks.long()
       # Create fused mask
        fused_masks = torch.zeros((1,recovered_masks.size(1),recovered_masks.size(2))).long().cuda()
        # Converting masks to correct class ID
        # Is there a vectorized way of doing this?
        for i in range(len(recovered_classes)):
            fused_masks += raw_masks[i:i+1] * recovered_classes[i]

        merged_masks = torch.zeros((num_x_windows,num_y_windows))
        window_size_x = int(recovered_masks.size(1)/num_x_windows)
        window_size_y = int(recovered_masks.size(2)/num_y_windows)
        # Here is the other operation that should be vectorized. It runs windows over the fused mask and gets the most common value (mode)
        for x in range(num_x_windows):
            for y in range(num_y_windows):
                mask_window = fused_masks[0,window_size_x*x:window_size_x*(x+1),window_size_y*y:window_size_y*(y+1)]
                mode =  torch.mode(mask_window)
                merged_masks[x,y] = torch.mode(mask_window.flatten()).values
        #paths.append(os.path.join(current_mask['dir_id'],'matterport_skybox_images' ,current_mask['scan_id']))
        current_mask["feature_maps"] = serializeTensor(merged_masks)
    return feature_maps