Autocast with BCELoss() on CPU

HaoY · January 18, 2024, 8:09pm

It is possible use bfloat16 training for a model with binary cross entropy loss?

The following code will throw out runtime error: “Found dtype Float but expected BFloat16” because my y_true from my dataloader is torch.float32 type.

loss_fun = torch.nn.BCELoss()
with torch.autocast(device_type='cpu'):
            y_pred = model(x)
            loss = loss_fun(y_pred, y_true)

If I manually do y_true = y_true.to(torch.bfloat16) before the with statement with the following codes:

loss_fun = torch.nn.BCELoss()
y_true = y_true.to(torch.bfloat16)
with torch.autocast(device_type='cpu'):
            y_pred = model(x)
            loss = loss_fun(y_pred, y_true)

I got RuntimeError: “binary_cross_entropy” not implemented for ‘BFloat16’.

From the official doc Automatic Mixed Precision package - torch.amp — PyTorch master documentation, binary_cross_entropy is listed under float32 CPU Ops. I also read from the same doc that

When entering an autocast-enabled region, Tensors may be any type. You should not call half() or bfloat16() on your model(s) or inputs when using autocasting.

But how should I code such that model(x) uses bfloat16 and then later loss_fun(y_pred, y_true) uses float32.

ptrblck · January 18, 2024, 8:31pm

You haven’t posted a code snippet reproducing the issue, but this small example works for me:

device = "cpu"
x = torch.randn(1, 10, device=device)
y_true = torch.empty(1, 10, device=device).uniform_(0, 1)
model = nn.Linear(10, 10).to(device)
loss_fun = torch.nn.BCELoss()

with torch.autocast(device_type=device, dtype=torch.bfloat16):
    y_pred = model(x).sigmoid()
    loss = loss_fun(y_pred, y_true)
loss.backward()

print(model.weight.grad.abs().sum())
# tensor(1.3052)

HaoY · January 18, 2024, 9:47pm

@ptrblck Thanks for your prompt response.

I just realized I made a silly mistake: I forgot my training codes have a utility module that use y_pred and defined loss_fun to re-calculate the loss together with other metrics for tracking/logging. I did not wrap the loss computation logic in that module under autocast though that module also consumes the same y_pred of type bfloat16.

The errors that I reported in my original post were actually cast from that utility module. If I don’t explicitly convert y_true to bfloat16, loss_fun got two input arguments of different dtypes. If I explicitly convert y_true to bfloat16, both input arguments are bfloat16 but binary_cross_entropy can not process bfloat16 because it is not under autocast.

In conclusion, autocast should work well with BCELoss() on CPUs.