Supposes that the batch size is B, each image has N rectangular proposals, thus exits a tensor proposals of
shape = [B, N, 4], with the format of xyxy. Each proposal has scores for C classes, corresponding to a tensor scores of
shape = [B, N, C]. I want to get a mask of
shape = [B, C, H, W] from proposals and shapes. And each pixel value of mask is the accumulation of all proposals that cover this positio:
s is the mask, and D is proposal, k means class k.
The only way I can figured out is using loop as below:
scores = torch.randint(10, [2, 2, 20]) / 10.0 batch_size, proposal_number, num_class = scores.shape height, width = 50, 50 proposals = torch.LongTensor([[[0, 0, 20, 20], [10, 12, 28, 40]], [[5, 9, 20, 40], [6, 20, 41, 38]]]) # [B, N, 4] masks =  for i in range(batch_size): mask = torch.zeros(num_class, height, width) for j in range(proposal_number): proposal = proposals[i, j].to(torch.int) mask[:, proposal: proposal, proposal: proposal] += scores[i, j, :].view(-1, 1, 1) masks.append(mask) masks = torch.stack(masks, dim=0)
But when N is large, this method is very slow. So I wonder if there exits faster way to do this?