Hello,
I finetuned MPT 7B model based using a large dataset for text generation. As a result of fine-tuning, I received two Pytprch models as follows:
model1 = pytorch_model-00001-of-00002.bin
model2 = ptorh_model-00002-of-00002.bin
I used the following code to load the models:
combined_state_dict = {**model1, **model2}
Load config
config = AutoConfig.from_pretrained(‘path’, trust_remote_code=True,torch_dtype=torch.bfloat16)
Initialize model
model = AutoModelForCausalLM.from_config(config,trust_remote_code=True).to(device)
Load combined state dict
model.load_state_dict(combined_state_dict)
Q1: Why I received two models instead of one?
Q2: Anyone can tell me if the above code is valid for the inference or should combine these models in a different way?
Hi,
I only fine-tuned the MPT 7B models utilizing the pre-trained weights, without making any changes to the config file. Additionally, I designated a directory path for storing the final fine-tuned PyTorch models. Upon examining the directory, I discovered that there are two sub-models.bin files, a result of the fine-tuning process. There’s also a JSON file named ‘pytorch_model.bin.index.json’, which contains the index of the layers to be used for inference. Some of these layers are found in the first model, while others are in the second model. I utilized the weight_map to load the appropriate layer from the correct model. The following code line was instrumental in resolving the issue:
base_path = “/path/”
Load the index information
with open(base_path + ‘pytorch_model.bin.index.json’, ‘r’) as f:
index_info = json.load(f)
Initialize an empty state dict to hold the weights
state_dict = {}
loaded_dicts = {}
for weight_name, file_name in index_info[‘weight_map’].items():
# Load the state dict for this file (if it hasn’t been loaded already)
if file_name not in loaded_dicts:
loaded_dicts[file_name] = torch.load(base_path + file_name)
# Get the weights for this layer
layer_weights = loaded_dicts[file_name][weight_name]
# Add these weights to our combined state dict
state_dict[weight_name] = layer_weights
Load config
config_path = base_path + ‘config.json’ # Replace with actual config path if different
config = AutoConfig.from_pretrained(config_path,trust_remote_code=True)
Initialize model
model = AutoModelForCausalLM.from_config(config,trust_remote_code=True,torch_dtype=torch.bfloat16)
Load combined state dict into the model
model.load_state_dict(state_dict)