Rendering using Batch of normals and batch of coefficient matrices

Hello, i am trying to build code for do a basic rendering using:
A batch of normal maps, Normals with size:
[batch_size, Width, Height, channels=3]
and a coefficient Matrix, M, with size: [batch_size, 4, 4, channels=3].

So, to put it simple i want to do a matrix multiplication, between every normal map with its corresponding Matrix M. I used **bmm()and more specifically after permuting the axes i got :
For the normal map. Normals.shape = [batch_size, Width*Height, channels=3], and for M = [batch_size, 16, channels=3] .
But god a batch of images with shape ** [batch_size, WidthHeight, 16] ** and after taking first 3 channels, **[batch_size, WidthHeight, :3]
, i don’t get the expected result. Any suggestion or help would be appreciated.