NaNs in input data breaking gradients

Unfortunately, any nan will create nan for any number it touches. So they have a tendancy to propagate. And this is the expected behavior here.
You definitely want to perform the masking before using them in any computations as much as possible.

2 Likes