Selective element-wise efficient multiplication?

I have two matrices of sizes (32, 512, 7,7) and (32, 512) respectively, where 32 is the batch size and 512 is the channel size. Let us call them A and B.
Now what I need to do is this:

        for i in range(batch_size):
            for j in range(channel_size):

In other words, I want to have each elements of the 7x7 matrix (of i th batch and j th channel) multiplied by the corresponding value from B matrix’s corresponding scalar for i th batch and j th channel number.

Using for loops becomes higly inefficient when I have to increasing batch size.
Is there any other way of doing the same?

Any help will be appreciated. :slight_smile:

Broadcasting: A *= B[:, :, None, None]
Note that this (like your version) changes A in place, but the usual * will work similarly.

Best regards


1 Like

Thanks @tom for you reply. Is there be any other solution to this problem? I have shown a simple example here. My actual implementation is very much similar, except it requires out of place implementation (C=A * B[:, :, None, None]). As I have seen that the out place implementation of the solution given by @tom, in my case is highly memory inefficient for a batch size of 64. The error messege is:

RuntimeError: CUDA out of memory. Tried to allocate 256.00 GiB (GPU 0; 23.65 GiB total capacity; 2.12 GiB already allocated; 20.16 GiB free; 2.20 GiB reserved in total by PyTorch)

Well, a single for loop might be a good compromise.