I want to run a few inference results on a pre-trained model trained purely on FP32 (without apex or amp). My main aim is to get faster results while inferencing and not while training. Hence, ideally, I want to initialize mixed precision learning after the model is trained and before the model is inputted with unseen data. How is it possible to do the same using NVIDIA apex library ? It’d be great if some code snippets can be attached too.
Secondly, for some model trained and inferred purely on FP32, let’s say
modelA and a model trained on FP32 (without apex or amp) but inferred with apex (opt_level=‘O1’), let’s say
modelB, how would the inference execution time of the below code for both the models differ ?
modelA = modelA.to(device) # to be inferred without apex modelB = modelB.to(device) # to be inferred with apex tensor = torch.rand(1,C,H,W).to(device) # random tensor for testing with torch.no_grad(): modelA.eval() modelA(tensor) # inferencing # Calculate cuda_time for the execution with torch.no_grad(): modelB.eval() # Initialize apex (opt_level='O1') code snippets for faster inferencing modelB(tensor) # inferencing # Calculate cuda_time for the execution