How does nn.linear work in cpp for multi dimension input? (torch._C._nn.linear)

This thread points to the CUDA implementations in case that’s helpful.