what are the precisions you use with / without TP? Different dtypes in communication / computation could cause different numerics.
what are the precisions you use with / without TP? Different dtypes in communication / computation could cause different numerics.