FSDP MixedPrecision vs AMP autocast?

Given an arbitrary fp32 nn.Module that fits on a single GPU, is there a full enumeration of the differences between

  • MixedPrecision(torch.bfloat16, torch.float32)
  • torch.autocast(“cuda”, dtype=torch.bfloat16)

in computation?

I noticed that certain modules/methods do not execute with correct precision using FSDP MixedPrecision, so there exists a difference.