Fast batch scalar/tensor multiplication

I have two tensors: binary tensor A of dimension (Nx1) and B of dimensions (NxCxHxW). I want to multiply them in such a way that each element in A serves as a scalar multiplier to each tensor in B, hence the output should be of size (NxCxHxW). Considering A is binary, what is the fastest way to do this in a forward pass ? Both A and B are outputs of some modules. I tried resizing A to the same size as B but that’s time consuming since B can be of a larger size.

Make Nx1 to Nx1x1x1 and multiply.

C = A.view(-1,1,1,1) * B
print(C.size()) # NxCxHxW

This is called broadcasting.

I don’t know how to utilize it for performance that A is binary.
However many codes just use FloatTensor with 0.0 and 1.0 as a mask and multiply it to the other FloatTensor.

I wanted to especially avoid the trivial solution (broadcasting) or atleast find a better alternative. Since A is binary I was trying to use something like index_select but it seems to not work during the forward pass.

index_select would work, but it serves the different purpose.

  • You will get a tensor of dynamic shape (?xCxHxW) instead of (NxCxHxW)
  • Backprop through integer index doesn’t work.

I had the same question before, and imho, broadcasting is the best way. Masking is very common in deep learning and people do this way. Maybe it can be improved but not worth the effort.