How to calculate A*A^T efficiently

I want to calculate A * A^T where A is an n * m matrix and the result will be an n * n matrix. Here m is very large, for example 10^6. The following code can simply do this:

B =, A.T)

However, the following code can’t be the fastest method to do this. First, it doesn’t take the symmetric property of the resulting matrix into consideration (it may save half of the computation). Second, it may involve extra memory accessment because the two matrix A and A^T share the same storage.

Are there some other functions that can be more efficient to do the multiplication? If not, I think pytorch should provide such functions because calculating A * A^T is an important and basic operation in practice.