How to replace apex.amp by pytorch amp?

Hi, I’m training a resnet model on a machine of 4xA40. The code is from this repository.

When I run the program, it logs:

/home/van-tien.pham/anaconda3/lib/python3.9/site-packages/apex/__init__.py:68: DeprecatedFeatureWarning: apex.amp is deprecated and will be removed by the end of February 2023. Use [PyTorch AMP](https://pytorch.org/docs/stable/amp.html)
  warnings.warn(msg, DeprecatedFeatureWarning)
Selected optimization level O1:  Insert automatic casts around Pytorch functions and Tensor methods.

Defaults for this optimization level are:
enabled                : True
opt_level              : O1
cast_model_type        : None
patch_torch_functions  : True
keep_batchnorm_fp32    : None
master_weights         : None
loss_scale             : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled                : True
opt_level              : O1
cast_model_type        : None
patch_torch_functions  : True
keep_batchnorm_fp32    : None
master_weights         : None
loss_scale             : 128.0
Warning:  multi_tensor_applier fused unscale kernel is unavailable, possibly because apex was installed without --cuda_ext --cpp_ext. Using Python fallback.  Original ImportError was: ModuleNotFoundError("No module named 'amp_C'")
/home/van-tien.pham/anaconda3/lib/python3.9/site-packages/apex/__init__.py:68: DeprecatedFeatureWarning: apex.parallel.DistributedDataParallel is deprecated and will be removed by the end of February 2023.
  warnings.warn(msg, DeprecatedFeatureWarning)
Warning:  apex was installed without --cpp_ext.  Falling back to Python flatten and unflatten.
RUNNING EPOCHS FROM 0 TO 250
DLL 2023-06-14 15:12:48.267767 - Epoch: 0 Iteration: 100  train.loss : 6.95447  train.total_ips : 1020.75 img/s
DLL 2023-06-14 15:14:16.889414 - Epoch: 0 Iteration: 200  train.loss : 6.85262  train.total_ips : 1042.96 img/s

My understanding is that things still work. But I wonder how can I replace the nvidia apex by pytorch apex as recommended. In the main program, the current code is:

try:
    from apex.parallel import DistributedDataParallel as DDP
    from apex.fp16_utils import *
    from apex import amp
except ImportError:
    raise ImportError(
        "Please install apex from https://www.github.com/nvidia/apex to run this example."
    )
# other codes
#############
if args.amp:
        model_and_loss, optimizer = amp.initialize(
            model_and_loss,
            optimizer,
            opt_level="O1",
            loss_scale="dynamic" if args.dynamic_loss_scale else args.static_loss_scale,
        )

    if args.distributed:
        model_and_loss.distributed()

    model_and_loss.load_model_state(model_state)

    train_loop(
        model_and_loss,
        optimizer,
        lr_policy,
        train_loader,
        val_loader,
        args.fp16,
        logger,
        should_backup_checkpoint(args),
        use_amp=args.amp,
        batch_size_multiplier=batch_size_multiplier,
        start_epoch=start_epoch,
        end_epoch=(start_epoch + args.run_epochs)
        if args.run_epochs != -1
        else args.epochs,
        best_prec1=best_prec1,
        prof=args.prof,
        skip_training=args.evaluate,
        skip_validation=args.training_only,
        save_checkpoints=args.save_checkpoints and not args.evaluate,
        checkpoint_dir=args.workspace,
        checkpoint_filename=args.checkpoint_filename,
        args=args,
    )

I guess that this line needs to be modified:
model_and_loss, optimizer = amp.initialize( model_and_loss, optimizer, opt_level="O1", loss_scale="dynamic" if args.dynamic_loss_scale else args.static_loss_scale, )

I read some threads about pytorch amp (Torch distributed data-parallel vs Apex distributed data-parallel - #5 by c_cj, ) and another repository that uses native amp of torch as follows:

if config.AMP_OPT_LEVEL != "O0":
        if use_amp == 'apex':
            model, optimizer = amp.initialize(model,
                                              optimizer,
                                              opt_level=config.AMP_OPT_LEVEL)
            loss_scaler = ApexScaler()
            if config.LOCAL_RANK == 0:
                logger.info(
                    'Using NVIDIA APEX AMP. Training in mixed precision.')
        if use_amp == 'native':
            amp_autocast = torch.cuda.amp.autocast
            loss_scaler = NativeScaler()
            if config.LOCAL_RANK == 0:
                logger.info(
                    'Using native Torch AMP. Training in mixed precision.')
        else:
            if config.LOCAL_RANK == 0:
                logger.info('AMP not enabled. Training in float32.')

But I’m unable to figure out how to use torch amp to replace the aforementioned line model_and_loss, optimizer = amp.initialize(...)

Another question is that does training with nvidia amp vs torch amp yield different accuracy or this stuff just relates to training speed?

Recommendations are appreciated! Thanks in advance!

Check these examples to see how amp is applied using the native utils.