For linear model (e.g logistic regression) the gradient of the loss function w.r.t model parameter over training data A is g1 which is a vector. Similarly the gradient of the loss function w.r.t model parameter over another data B is g2. We can compute the similarity between g1 and g2 by its inner product easily since they are both vectors.

How to extend inner product between two gradients for neural networks, since the gradient here is no longer a vector. Should we stack all model’s parameter in gradients together in a vector?