Tensor parallel numeric mismatch

tianyu · June 18, 2025, 12:06am

what are the precisions you use with / without TP? Different dtypes in communication / computation could cause different numerics.