I want to compute $x^TAx$ where $x$ is a vector and A is a matrix (i.e. bilinear form). Although nn.Bilinear would work, in my case $A$ is block-diagonal. So, I wanted to create a `nn.Module`

that would compute this using much less memory and computation by only remembering the block diagonal matrices.

For example, Below is an example of computing, using `nn.Bilinear(x,x,A,bias)`

(here x’s batch dim is 3 and output dim is 2

```
x = tensor([[ 0., 1., 2., 3.],
[ 4., 5., 6., 7.],
[ 8., 9., 10., 11.]])
A = tensor([[[ 0., 1., 0., 0.],
[ 2., 3., 0., 0.],
[ 0., 0., -0., -1.],
[ 0., 0., -2., -3.]],
[[ 0., 2., 0., 0.],
[ 4., 6., 0., 0.],
[ 0., 0., -0., -2.],
[ 0., 0., -4., -6.]]])
b = tensor([0, 1])
```

where,

`x.shape, weight.shape, bias.shape torch.Size([3, 4]) torch.Size([2, 4, 4]) torch.Size([2])`

and the result of `nn.Bilinear`

is

```
tensor([[ -42., -83.],
[-138., -275.],
[-234., -467.]])
```

Initially, I tried the following approach : since the quadratic form of block diagonals can be made equal to the sum of smaller chunks of bilinear (i.e. if x = [x1,x2] and A = [[A1,0][0,A2]], then $x^TAx = x_1^TA_1x_1 + x_2^TA_2x_2$), i tried to divide the x into chunks equal to the size of the block matrices, then send reshape it s.t. the batch dimension now is multiplied by the number of block matrices, then computed the bilinear using the increase batch size, but it didn’t work as expected.

Any suggestions or ideas would be greatly appreciated!

(I thought of running a for loop of `torch.matmul`

over each block diagonal could work, but I am afraid that would not be parallel and hence decrease speed)