I wanna transform feature-map [batch, ch, n, m] divided by grids into pixel-based.
Any help, please?
Could you explain your use case a bit and also what you mean by “pixel-based”?
def attention(keyA, keyB, value, mask=None, dropout=None):
d_k = keyA.size(-1) scores = torch.matmul(keyA, keyB.transpose(-2, -1)) \ / math.sqrt(d_k) if mask is not None: scores = scores.masked_fill(mask == 0, -1e9) p_attn = F.softmax(scores, dim = -1) return torch.matmul(p_attn, value), p_attn
This is a function computes attention for two feature-maps input. Actually I want to transfer the grid feature maps into denser one and not divided regions to n*m. To consider attention pixel-based not region-based. Thanks.