NN.Linear latency report?

Hi PyTorch!

I was wondering if there’s anywhere that has explored different statistics about nn.Linear? For example, how much slower is it in CPU vs GPU, how much time it takes to run depending on dimensions size, how much time does it take to load a matrix for the input to the operating, what about for loading a matrix for the weights, how long does back propagation take?

I think I could do the analysis myself, but I want to see what is out there first.

Thank you,

Kovek