I think my model’s parameters are very small. The histgrams below shows that most of the parameters are < 0.1. I am worried that this makes my model sensitive to floating point precision.
My model is a very typical autoregressive generative model with 12 decoder-only transformer layers (almost identical to GPT-2)
Is the scale of my model’s parameters acceptable?
You can create these histgrams with your own model parameters with the code below.
Can anyone share your model’s parameter scale?
import numpy as np
import torch
import matplotlib.pyplot as plt
def visualize_parameter_scale(state_dict_path:str, n_shown:int=1e5)->None:
"""Visualize your model's parameter scale with histgrams.
Args:
state_dict_path (str): Path to a state dict that stores model parameters.
n_shown (int, optional): If specified, the given number of parameters are randomly sampled for plot.
The default is 100,000. If you want to plot everyhting, set this to None.
Returns:
None
"""
# Load state dict
sd = torch.load(state_dict_path, map_location="cpu")
# Collect params
weights = []
biases = []
for k, v in sd.items():
f = v.flatten().numpy()
if ".weight" in k:
weights.append(f)
elif ".bias" in k:
biases.append(f)
weights = np.hstack(weights)
biases = np.hstack(biases)
if n_shown is not None:
n_shown = int(n_shown)
weights = np.random.choice(weights, n_shown)
biases = np.random.choice(biases, n_shown)
# plot
fig, axes = plt.subplots(1, 2, figsize=(10, 5))
axes[0].hist(x=weights, bins=np.logspace(-5, 1, 100), color='dodgerblue', alpha=0.75)
axes[0].set_xscale('log')
axes[0].set_title("Weights")
axes[1].hist(x=biases, bins=np.logspace(-5, 1, 100), color='dodgerblue', alpha=0.75)
axes[1].set_xscale('log')
axes[1].set_title("Biases")
fig.supxlabel('Element value')
fig.supylabel('Number of weight elements')
fig.suptitle("Model parameter element scale")
plt.show()