TransformerEncoder truncates output when some token positions are masked by `src_key_padding_mask` across batch

Update: This relates to the option enable_nested_tensor. When setting it to False things become fine. I met tons of problems when using nester tensor and fast paths (including a previous post) – I think it would be very very helpful if PyTorch can instead set this to False by default. It’s just not as stable as it should be.