I have saved checkpoints of my model every n iterations while training. After I load the checkpoints, I quickly get networks full of NaN
s which soon crash. After investigating I’ve found the state_dict
s I’ve saved are full of NaN
s. However, I believe torch.save
is working as intended and not corrupting my data. This suggests that my networks are training while full of NaN
s, but this doesn’t make sense because the networks continue to run without issue. It’s only until I load the serialized state_dict
s that the code crashes.
It seems like either the serialization code is corrupting my network’s parameters or the network is running without issue while full of NaN
values and only crashing once it gets deserialized and loaded back in.
My loading/saving code is simple:
# Save model parameters
def save_checkpoint(self, env_name, suffix="", ckpt_path=None):
if not os.path.exists('checkpoints/'):
os.makedirs('checkpoints/')
if ckpt_path is None:
ckpt_path = "checkpoints/sac_checkpoint_{}_{}".format(env_name, suffix)
print('Saving models to {}'.format(ckpt_path))
torch.save({'policy_state_dict': self.policy.state_dict(),
'critic_state_dict': self.critic.state_dict(),
'critic_target_state_dict': self.critic_target.state_dict(),
'critic_optimizer_state_dict': self.critic_optim.state_dict(),
'policy_optimizer_state_dict': self.policy_optim.state_dict()}, ckpt_path)
# Load model parameters
def load_checkpoint(self, ckpt_path, evaluate=False):
print('Loading models from {}'.format(ckpt_path))
if ckpt_path is not None:
checkpoint = torch.load(ckpt_path)
self.policy.load_state_dict(checkpoint['policy_state_dict'])
self.critic.load_state_dict(checkpoint['critic_state_dict'])
self.critic_target.load_state_dict(checkpoint['critic_target_state_dict'])
self.critic_optim.load_state_dict(checkpoint['critic_optimizer_state_dict'])
self.policy_optim.load_state_dict(checkpoint['policy_optimizer_state_dict'])
if evaluate:
self.policy.eval()
self.critic.eval()
self.critic_target.eval()
else:
self.policy.train()
self.critic.train()
self.critic_target.train()
My networks are also fairly simple MLPs
class QNetwork(torch.jit.ScriptModule):
def __init__(self, num_inputs, num_actions, hidden_dim):
super(QNetwork, self).__init__()
# Q1 architecture
self.linear1 = nn.Linear(num_inputs + num_actions, hidden_dim)
self.linear2 = nn.Linear(hidden_dim, hidden_dim)
self.linear3 = nn.Linear(hidden_dim, 1)
# Q2 architecture
self.linear4 = nn.Linear(num_inputs + num_actions, hidden_dim)
self.linear5 = nn.Linear(hidden_dim, hidden_dim)
self.linear6 = nn.Linear(hidden_dim, 1)
self.apply(weights_init_)
@torch.jit.script_method
def forward(self, state, action):
xu = torch.cat([state, action], 1)
x1 = F.relu(self.linear1(xu))
x1 = F.relu(self.linear2(x1))
x1 = self.linear3(x1)
x2 = F.relu(self.linear4(xu))
x2 = F.relu(self.linear5(x2))
x2 = self.linear6(x2)
return x1, x2
I’ve tried changing my learning rate and using normalization layers but no luck. Any ideas? I can share more code if that would help.