'AngleBackward' returned nan values

eagomez · May 25, 2021, 5:03pm

I have a loss function that uses torch.angle(), but the argument can take any complex value, therefore even if I do some trick like arg + 1e-7 or something along those lines, technically nothing prevents the argument of being exactly the value to produce a nan during the backward pass.

What would be a clean solution to such problem? Can I just plug in torch.nan_to_num() and expect the backward pass to be happy, or should I switch to something like torch.atan() and manually add +1e-7 to the denominator of the fraction and to replace torch.angle() by this? The __call__ of this class is the one that is causing me issues:

Edit I think it is solved now and the issue were denormals that for some reason I forgot they existed. I just set a threshold that corresponds to the same eps that tensorflow uses and seems to be fine, however I found this:
https://pytorch.org/docs/stable/generated/torch.set_flush_denormal.html

but it mentions that it is hardware-dependent. What would be a good cross-platform manner to manage this to avoid similar issues? or is it safe to assume that it will be present in most of updated machines?

class ComplexCompressedMSELoss:
    def __init__(self,
                 c_: float = 0.3,
                 lambda_: float = 0.3,
                 eps: float = 1e-7):
        super().__init__()
        self.c_ = c_
        self.lambda_ = lambda_
        self.eps = eps
 
    def __call__(self, y_pred_mask, x_complex, y_complex):
        # get target magnitude and phase
        y_mag = torch.abs(y_complex)
        y_phase = torch.angle(y_complex)

        # predicted complex stft
        y_pred_mask = y_pred_mask.squeeze(1).permute(0, 2, 1)
        y_pred_complex = y_pred_mask.type(torch.complex64) * x_complex

        # get predicted magnitude and phase
        y_pred_mag = torch.abs(y_pred_complex)
        y_pred_phase = torch.angle(y_pred_complex)

        # target complex exponential        
        y_complex_exp = (y_mag ** self.c_).type(torch.complex64) * \
                torch.exp(1j * y_phase.type(torch.complex64))

        # predicted complex exponential
        y_pred_complex_exp = (y_pred_mag ** self.c_).type(torch.complex64) * \
                torch.exp(1j * y_pred_phase.type(torch.complex64))

        # magnitude only loss component
        mag_loss = torch.abs(y_mag ** self.c_ - y_pred_mag ** self.c_) ** 2
        mag_loss = torch.sum(mag_loss, dim=[1, 2])

        # complex loss component
        complex_loss = torch.abs(y_complex_exp - y_pred_complex_exp) ** 2
        complex_loss = torch.sum(complex_loss, dim=[1, 2])

        # blend both loss components
        loss = (1 - self.lambda_) * mag_loss + (self.lambda_) * complex_loss

        # returns the mean blended loss of the batch
        return torch.mean(loss)

MinkyuChoi · October 5, 2021, 8:34pm

Hi eagomez,
May I ask how did you solved issue with Nan gradient from AngleBackward?
I am also encountering Nan issue with torch.angle().
I also tried torch.set_flush_denormal(mode) but the issue still exists.
If you have any other solutions you tried, that would be very helpful for me and others facing the same issue.

eagomez · October 5, 2021, 10:12pm

Hi MinkyuChoi,

I just replaced the potential denormals for a certain threshold that worked in my experiments as follows:

def replace_denormals(x: torch.tensor, threshold=1e-10):
    y = x.clone()
    y[(x < threshold) & (x > -1.0 * threshold)] = threshold
    return y

MinkyuChoi · October 6, 2021, 5:49pm

Thank you Esteban Gomez, your solution works for me. The Nan issue from the AngleBackward and Torch.Angle() seems gone. However, I still don’t understand why the AngleBackward returns Nan. I will think about it further and will post a new discussion. Thank you again for your help.

eagomez · October 6, 2021, 10:21pm

Glad to know it worked and thanks for the follow-up!

Nitesh_Kumar · June 27, 2024, 10:19am

torch.angle can produce NaN gradients for inputs that are close to (0, 0). torch.atan2 produces NaN gradient when the input is exactly (0, 0)
So if you replace torch.angle with torch.atan2 then it solves the problem. Both of these do the same thing.
See the example below, I have changed like this and it works
angle = torch.angle(complx_spect + 1e-7)
angle = torch.atan2((complx_spect.imag + 1e-7), (complx_spect.real + 1e-7))