A error during the backward about the singular matrix

First the error information is here:

File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/allennlp/run.py", line 18, in <module>
    main(prog="allennlp")
  File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/allennlp/commands/__init__.py", line 70, in main
    args.func(args)
  File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/allennlp/commands/train.py", line 101, in train_model_from_args
    args.recover)
  File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/allennlp/commands/train.py", line 131, in train_model_from_file
    return train_model(params, serialization_dir, file_friendly_logging, recover)
  File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/allennlp/commands/train.py", line 321, in train_model
    metrics = trainer.train()
  File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/allennlp/training/trainer.py", line 749, in train
    train_metrics = self._train_epoch(epoch)
  File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/allennlp/training/trainer.py", line 493, in _train_epoch
    loss.backward()
  File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/torch/tensor.py", line 93, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/torch/autograd/__init__.py", line 90, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: MAGMA getrf : U(4,4) is 0, U is singular at /opt/conda/conda-bld/pytorch_1532584813488/work/aten/src/THC/generic/THCTensorMathMagma.cu:413

So I calculate the softmax of head_num vectors and get a batch of matrix of shape (batch_size, head_num, encoding_dim) which we called it A. And then I calculate as
S = torch.bmm(A, A.transpose(2,1)) - torch.eye(head_num).cuda()
At last, I calculate the determinant for S[i] and add to the final loss:

self_attention_loss = torch.tensor(0, dtype=torch.float).cuda()
for i in range(batch_size):
    self_attention_loss += torch.det(loss_matrix[i])
loss += (self_attention_loss * self._loss_rate / torch.tensor(batch_size, dtype=torch.float).cuda())

It seems that the test codes as below is ok:

import torch

a = torch.Tensor([[0,0],[0,0]])
a.requires_grad=True
b = torch.det(a)
print(b)
b.backward()
print(a.grad)

The output is:

tensor(0., grad_fn=<DetBackward>)
tensor([[0., 0.],
        [0., 0.]])

So I don’t know where the reason is? Is it because the determinant is zero?
Thanks you all a lot!