Hi all. I am trying to run the most basic single-layer RNN with complex inputs on the nightly build of PyTorch (1.10.0-CPU). The problem is that the gradient always evaluates to NaN. I’ve tried all recurrent layers (RNN, GRU, LTSM) with the same result. Here is the model:
class CRNN(torch.nn.Module): def __init__( self, input_dim: int, output_dim: int, **kwargs ): super().__init__(**kwargs) self.model = torch.nn.RNN(input_dim, output_dim, batch_first=True, dtype=torch.cfloat) self.loss = torch.nn.L1Loss() self.optimizer = torch.optim.Adam(self.parameters()) def forward(self, x): x = torch.unsqueeze(x, 0) return self.model(x) def fit(self, x, y): self.optimizer.zero_grad() z = torch.squeeze(self(x), 0) loss = self.loss(z, y) loss.backward() self.optimizer.step() return loss.item()
torch.autograd.set_detect_anomaly(True) and it gave the following results:
- For RNN, the first NaN appears in
- For GRU and LSTM, the first NaN appears in
Since I have no idea how to interpret this, I am left to hope for help from the community.