I have fine-tuned bert-base model with amp and without amp using
MAX_SEQ_LEN=512. I compared the performance among these models in terms of:
- Fine-tuning time
- Inference time on CPU/GPU
- Model size
While conducting first experiment, I observed that in terms of Fine-tuning time ,
bert model with amp performs better as compare to
However, when I compare the inference time and model size, both models have same inference time and model size.
Could anyone please explain why this is the case?