Why use torch.bmm instead of Linear()?

Linear layers already deal with batching properly as I understand it

I think the fundamental difference is, that torch.bmm is a mathematical operation, while torch.nn.Linear is a layer with an internal state (which may be implemented via torch.bmm).

1 Like