Linear layers already deal with batching properly as I understand it
I think the fundamental difference is, that torch.bmm
is a mathematical operation, while torch.nn.Linear
is a layer with an internal state (which may be implemented via torch.bmm
).
1 Like