Why use torch.bmm instead of Linear()?

whoab · April 17, 2019, 6:48am

Linear layers already deal with batching properly as I understand it

justusschock · April 17, 2019, 7:31am

I think the fundamental difference is, that torch.bmm is a mathematical operation, while torch.nn.Linear is a layer with an internal state (which may be implemented via torch.bmm).