I’ve been messing around with a Transformer using Time2Vec embeddings and have gone down a rabbit hole concerning input tensor shapes. It appears that PyTorch’s input shapes are uniform throughout the API, expecting
(seq_len, batch_size, features) for timestep models like
nn.GRU. TensorFlow’s API inverts the first two dimensions, expecting
(batch_size, seq_len, features).
It seems like small discrepancies between the APIs like this would cause critical and difficult-to-debug bugs for developers looking to make the switch from one library to the other. This makes me wonder what the rationale is, on either side, for preferring one input shape over the other.