Hi all. I am trying to run the most basic single-layer RNN with complex inputs on the nightly build of PyTorch (1.10.0-CPU). The problem is that the gradient always evaluates to NaN. I’ve tried all recurrent layers (RNN, GRU, LTSM) with the same result. Here is the model:
Are you able to reproduce this issue using random inputs? If so, could you post the random tensor initialization so that we could try to reproduce it, please?
Yes, this could be the case, but note that your code snippet uses the specified batch_size as the sequence length, since you are calling x = torch.unsqueeze(x, 0) and are thus creating a hard-coded batch size of 1 while the batch_size specified in the DataLoader is moved to dim1 and is thus the sequence length.
Yes, this is exactly why I assumed it to be an issue of exploding gradients: the error appears when the sequence length reaches a critical threshold.
Is this expected behavior or a bug?
The same behavior does not occur with real numbers for the same model, though. Is there a theoretical reason as to why complex numbers are more prone to exploding gradients?
Also, I’ve tested the same setup with LTSM and GRU and there the cutoff line is around 100, meaning they delay the onset of exploding gradients compared to RNN, but do not prevent it. Same with L2-norm gradient clipping.