here is what nn.Linear do but i am confused.i think thre should be A^T*x and not vice versa
The documentation refers to the implemented method as seen here.
Your suggestion won’t work with plain matmul
, as x
has the batch dimension in dim0.
The documentation refers to the implemented method as seen here.
Your suggestion won’t work with plain matmul
, as x
has the batch dimension in dim0.