Hi, I’m training a resnet model on a machine of 4xA40. The code is from this repository.
When I run the program, it logs:
/home/van-tien.pham/anaconda3/lib/python3.9/site-packages/apex/__init__.py:68: DeprecatedFeatureWarning: apex.amp is deprecated and will be removed by the end of February 2023. Use [PyTorch AMP](https://pytorch.org/docs/stable/amp.html)
warnings.warn(msg, DeprecatedFeatureWarning)
Selected optimization level O1: Insert automatic casts around Pytorch functions and Tensor methods.
Defaults for this optimization level are:
enabled : True
opt_level : O1
cast_model_type : None
patch_torch_functions : True
keep_batchnorm_fp32 : None
master_weights : None
loss_scale : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled : True
opt_level : O1
cast_model_type : None
patch_torch_functions : True
keep_batchnorm_fp32 : None
master_weights : None
loss_scale : 128.0
Warning: multi_tensor_applier fused unscale kernel is unavailable, possibly because apex was installed without --cuda_ext --cpp_ext. Using Python fallback. Original ImportError was: ModuleNotFoundError("No module named 'amp_C'")
/home/van-tien.pham/anaconda3/lib/python3.9/site-packages/apex/__init__.py:68: DeprecatedFeatureWarning: apex.parallel.DistributedDataParallel is deprecated and will be removed by the end of February 2023.
warnings.warn(msg, DeprecatedFeatureWarning)
Warning: apex was installed without --cpp_ext. Falling back to Python flatten and unflatten.
RUNNING EPOCHS FROM 0 TO 250
DLL 2023-06-14 15:12:48.267767 - Epoch: 0 Iteration: 100 train.loss : 6.95447 train.total_ips : 1020.75 img/s
DLL 2023-06-14 15:14:16.889414 - Epoch: 0 Iteration: 200 train.loss : 6.85262 train.total_ips : 1042.96 img/s
My understanding is that things still work. But I wonder how can I replace the nvidia apex by pytorch apex as recommended. In the main program, the current code is:
try:
from apex.parallel import DistributedDataParallel as DDP
from apex.fp16_utils import *
from apex import amp
except ImportError:
raise ImportError(
"Please install apex from https://www.github.com/nvidia/apex to run this example."
)
# other codes
#############
if args.amp:
model_and_loss, optimizer = amp.initialize(
model_and_loss,
optimizer,
opt_level="O1",
loss_scale="dynamic" if args.dynamic_loss_scale else args.static_loss_scale,
)
if args.distributed:
model_and_loss.distributed()
model_and_loss.load_model_state(model_state)
train_loop(
model_and_loss,
optimizer,
lr_policy,
train_loader,
val_loader,
args.fp16,
logger,
should_backup_checkpoint(args),
use_amp=args.amp,
batch_size_multiplier=batch_size_multiplier,
start_epoch=start_epoch,
end_epoch=(start_epoch + args.run_epochs)
if args.run_epochs != -1
else args.epochs,
best_prec1=best_prec1,
prof=args.prof,
skip_training=args.evaluate,
skip_validation=args.training_only,
save_checkpoints=args.save_checkpoints and not args.evaluate,
checkpoint_dir=args.workspace,
checkpoint_filename=args.checkpoint_filename,
args=args,
)
I guess that this line needs to be modified:
model_and_loss, optimizer = amp.initialize( model_and_loss, optimizer, opt_level="O1", loss_scale="dynamic" if args.dynamic_loss_scale else args.static_loss_scale, )
I read some threads about pytorch amp (Torch distributed data-parallel vs Apex distributed data-parallel - #5 by c_cj, ) and another repository that uses native amp of torch as follows:
if config.AMP_OPT_LEVEL != "O0":
if use_amp == 'apex':
model, optimizer = amp.initialize(model,
optimizer,
opt_level=config.AMP_OPT_LEVEL)
loss_scaler = ApexScaler()
if config.LOCAL_RANK == 0:
logger.info(
'Using NVIDIA APEX AMP. Training in mixed precision.')
if use_amp == 'native':
amp_autocast = torch.cuda.amp.autocast
loss_scaler = NativeScaler()
if config.LOCAL_RANK == 0:
logger.info(
'Using native Torch AMP. Training in mixed precision.')
else:
if config.LOCAL_RANK == 0:
logger.info('AMP not enabled. Training in float32.')
But I’m unable to figure out how to use torch amp to replace the aforementioned line model_and_loss, optimizer = amp.initialize(...)
Another question is that does training with nvidia amp vs torch amp yield different accuracy or this stuff just relates to training speed?
Recommendations are appreciated! Thanks in advance!