FP8 support on H100

navmarri · October 18, 2023, 8:15pm

with H100 supporting FP8, is there any plan to support FP8 training in pytorch?
Looks like the only alternative is to use FP8 through nvidia transformer engine. Even transformer engine doesn’t support all nn modules like conv layers for FP8 support.

marksaroufim · October 18, 2023, 8:44pm

@drisspg has been working on this

navmarri · October 18, 2023, 10:02pm

@marksaroufim is there a timeline for the feature to be available?

drisspg · October 19, 2023, 11:13pm

Hey @navmarri we actually have landed some fp8 primitives already in PyTorch. Specifically we have landed fp8e4m3 and fp8e5m2 datatypes and a private function (with no BC/FC guarantees) for doing scaled matmul on h100 machines.

For a quick example script of how you can use it, check out:

gist.github.com

https://gist.github.com/malfet/7874d96b99670c3da83cbb779ab770c6

scale_mm_example.py

import torch
import torch.nn.functional as F

def to_float8(x, dtype=torch.float8_e4m3fn):
    finfo = torch.finfo(dtype)
    # Calculate the scale as dtype max divided by absmax
    scale = finfo.max / x.abs().max().clamp(min=1e-12)
    # scale and clamp the tensor to bring it to
    # the representative range of float8 data type
    # (as default cast is unsaturated)

This file has been truncated. show original

navmarri · October 20, 2023, 12:05am

@drisspg awesome!
does this work with FP8 precision training using FSDP on H100s?

KohakuBlueleaf · October 23, 2023, 8:30am

As I know (and also be mentioned in Nvidia/TransformerEngine)
RTX 4000 series (compute capibility 8.9) should also support FP8 computing.
Does pytorch support that too?
or for now it is only for H100 and need further works for sm8.9 cards?

bhack · March 8, 2024, 5:18pm

Is the mixed precision support planned for FP8?

There was an announcement in JAN in the dev forum:

ptrblck · March 8, 2024, 7:55pm

TransformerEngine supports FP8 for a long time now. Alternatively you can also play around with pytorch-labs/float8_experimental.

bhack · March 8, 2024, 8:15pm

Thanks, my point was more related to pytorch AMP support.