I am really new to this field and have a few questions. I am trying to quantize a model using MIT Han lab AWQ quantization method. I have successfully pulled the model from huggingface and applied quantization inside a docker container. I now have a file titled as “Llama-2-7b-hf-w4-g128-awq-v2.pt.” I need desperate help on how to use this model from here. I need to benchmark this models performance. Any help would be greatly appreciateed. I do not know where to start so if anyone can help guide me(provide links or resources) I would be eternally grateful.
Thank you in advance!!
I assume your script has created this file? If so, check which object was passed to the torch.save
method. I would guess it’s either the model.state_dict()
, which will contain all trained parameters and buffers, or a custom dict
containing the state_dict
as well as other objects and data, e.g. the optimizer.state_dict()
etc.
The file itself is just an archive and can be loaded via torch.load
. Load it in a new script and check its content.
Just to add to @ptrblck’s comment, the .pt
file doesn’t save the model itself (just the weights, as @ptrblck said). So, if you don’t have the original source code for the model you’ll need to re-code the model and then load the weights.