PyTorch vs TensorFlow Input Shapes

I’ve been messing around with a Transformer using Time2Vec embeddings and have gone down a rabbit hole concerning input tensor shapes. It appears that PyTorch’s input shapes are uniform throughout the API, expecting (seq_len, batch_size, features) for timestep models like nn.Transformer, nn.LSTM, nn.GRU. TensorFlow’s API inverts the first two dimensions, expecting (batch_size, seq_len, features).

It seems like small discrepancies between the APIs like this would cause critical and difficult-to-debug bugs for developers looking to make the switch from one library to the other. This makes me wonder what the rationale is, on either side, for preferring one input shape over the other.

PyTorch uses the mentioned shape for performance reasons. If you prefer to have the batch dimension in dim0, you could set batch_first=True while creating the module, which could then be easier to port code.

1 Like