PyTorch uses the mentioned shape for performance reasons. If you prefer to have the batch dimension in dim0
, you could set batch_first=True
while creating the module, which could then be easier to port code.
1 Like