Converting float16 tensor to numpy causes rounding

gholste · February 26, 2024, 4:07am

I am running mixed-precision training, converting my model’s outputs (float16) to numpy, and storing those outputs for later evaluation. I just noticed that all numpy arrays are rounded to 2 decimal points! Why would this happen?! This is very easily reproducible:

import torch
x = torch.tensor([64.6250, 61.7812, 61.8750, 64.7500, 64.5000], dtype=torch.float16)
print(x)  # tensor([64.6250, 61.7812, 61.8750, 64.7500, 64.5000], dtype=torch.float16)
print(x.numpy())  # array([64.6 , 61.78, 61.88, 64.75, 64.5 ], dtype=float16)

The solution I’ve found is simply to run x.float().numpy()… but why is this happening in the first place? And, no, this is not a matter of printing preferences; I can set the precision as high as I want with np.set_printoptions and it is still rounded to 2 decimal places.

Here is my PyTorch version info from conda list:

pytorch                   2.1.0           py3.10_cuda11.8_cudnn8.7.0_0    pytorch
pytorch-cuda              11.8                 h7e8668a_5    pytorch

Linux version info:

NAME="Ubuntu"
VERSION="20.04.6 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.6 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

ptrblck · February 26, 2024, 4:12am

Increase the precision in numpy’s printoptions and it’ll show the same values:

np.set_printoptions(precision=4, floatmode="fixed")
print(x.numpy())
# [64.6250 61.7812 61.8750 64.7500 64.5000]

gholste · February 26, 2024, 4:28am

Wow, okay thanks! I tried this, but did not specify the proper floatmode. I guess I don’t understand the default numpy print option behavior here: 64.625 is rounded to 64.6 but 61.875 is rounded to 61.88?

This was causing issues for me because I was storing these numpy arrays in pandas data frame cells (now I see why this is not recommended…), causing them to be represented as string objects, meaning I was actually using the 2-decimal-place-rounded values for evaluation.