Sorry for this kind of question, it is maybe because I’m too weak at linear algebra.
I am trying to implement new optimizer strategy. The author mentioned this formula.
Where gk is the gradient tensor and pk is the same shape tensor as gk. For example, if the gradient tensor has the shape (c,m,n) then its transpose tensor will have the shape is (n,m,c).
How can I do the multiplication between two tensors to get the scalar result? In that paper:
The author also told that pk different from 0 and the multiplication is smaller than 0.
I am not sure about this paper.
Typically, you can consider the parameters as an 1D array (vectorize the conv and fully connected layer parameters) and correspondingly the gradients will also be an 1D array.
Further, you can do
p^T.g as normal inner product.
Would this work?
You mean I should flatten the gradient tensor?
Or for gk shape (c,m,n) and pk transpose shape is (c,n,m) then multiply to get the result tensor shape is (c, n, n) (equal zero if all elements are zero)
yes. To me, it seems that way.
But I am not sure about the context of this operation. You might know it better.
At first, I was about to flatten it, but I am not sure is it correct with what the author means or not, maybe I have to dig deeper into mathematics.
Anyway, thank you for your help and very fast answer.
Okay, flatten is the solution where they implement it in this repo