Finetuning MPT 7B


I finetuned MPT 7B model based using a large dataset for text generation. As a result of fine-tuning, I received two Pytprch models as follows:

model1 = pytorch_model-00001-of-00002.bin
model2 = ptorh_model-00002-of-00002.bin

I used the following code to load the models:

combined_state_dict = {**model1, **model2}

Load config

config = AutoConfig.from_pretrained(‘path’, trust_remote_code=True,torch_dtype=torch.bfloat16)

Initialize model

model = AutoModelForCausalLM.from_config(config,trust_remote_code=True).to(device)

Load combined state dict


Q1: Why I received two models instead of one?

Q2: Anyone can tell me if the above code is valid for the inference or should combine these models in a different way?

  1. It’s unclear since you didn’t provide any information about how you are serializing the model. .bin could also indicate that a 3rd party package was used to store everything in raw binary.

  2. I guess you are using a higher-level API from HuggingFace so their discussion board might be the better place to ask about their API.

I only fine-tuned the MPT 7B models utilizing the pre-trained weights, without making any changes to the config file. Additionally, I designated a directory path for storing the final fine-tuned PyTorch models. Upon examining the directory, I discovered that there are two sub-models.bin files, a result of the fine-tuning process. There’s also a JSON file named ‘pytorch_model.bin.index.json’, which contains the index of the layers to be used for inference. Some of these layers are found in the first model, while others are in the second model. I utilized the weight_map to load the appropriate layer from the correct model. The following code line was instrumental in resolving the issue:

base_path = “/path/”

Load the index information

with open(base_path + ‘pytorch_model.bin.index.json’, ‘r’) as f:
index_info = json.load(f)

Initialize an empty state dict to hold the weights

state_dict = {}
loaded_dicts = {}

for weight_name, file_name in index_info[‘weight_map’].items():
# Load the state dict for this file (if it hasn’t been loaded already)
if file_name not in loaded_dicts:
loaded_dicts[file_name] = torch.load(base_path + file_name)

# Get the weights for this layer
layer_weights = loaded_dicts[file_name][weight_name]

# Add these weights to our combined state dict
state_dict[weight_name] = layer_weights

Load config

config_path = base_path + ‘config.json’ # Replace with actual config path if different
config = AutoConfig.from_pretrained(config_path,trust_remote_code=True)

Initialize model

model = AutoModelForCausalLM.from_config(config,trust_remote_code=True,torch_dtype=torch.bfloat16)

Load combined state dict into the model