I’m running a convnet, and getting nans. However, there are no obvious divisions, so it’s unclear to me how the nans could be forming. As far as I understand it, there are two ways to obtain a nan result:
- divide 0 by 0. (dividing non-zero by 0 gives inf; dividing 0 by non-zero gives 0; dividing 0 by 0 gives nan)
- the result of pretty much any function for which any of the inputs are nan
I have the following statement in my code:
qtargets = qtargets + p.discount_factor * qmax_next * qvalues_next_mask
Yes, it’s an RL thing.
The result of this flows into a loss criterion. When the loss is nan, I print out a bunch of diagnostic information, including:
print('qtargets is nan? ', math.isnan(qtargets.sum().item()))
print('qvalues_next_mask is nan? ', math.isnan(qvalues_next_mask.sum().item()))
print('qvalues_next is nan? ', math.isnan(qvalues_next.sum().item()))
print('qmax_next is nan? ', math.isnan(qmax_next.sum().item()))
df_qm = p.discount_factor * qmax_next
print('df_qm is nan? ', math.isnan(df_qm.sum().item()))
qm_qvnm = qmax_next * qvalues_next_mask
print('qm_qvnm is nan? ', math.isnan(qm_qvnm.sum().item()))
df_qm_qvnm = p.discount_factor * qmax_next * qvalues_next_mask
print('df_qm_qvnm is nan? ', math.isnan(df_qm_qvnm.sum().item()))
The results of this is somewhat stochastic, but includes for example:
qtargets is nan? True
qvalues_next_mask is nan? False
qvalues_next is nan? False
qmax_next is nan? False
df_qm is nan? False
qm_qvnm is nan? True
df_qm_qvnm is nan? True
p.discount_factor is a scalar float (=0.5). These tensors are all torch cuda tensors.
How can qmax_next * qvalues_next_mask be nan, when they themselves are each non-nan?